A JavaScript library to add search functionality to any Jekyll blog.
You have a blog, built with Jekyll, and want a lightweight search functionality on your blog, purely client-side?
No server configurations or databases to maintain.
Just 5 minutes to have a fully working searchable blog.
npm install simple-jekyll-search
search.jsonPlace the following code in a file called search.json in the root of your Jekyll blog. (You can also get a copy from here)
This file will be used as a small data source to perform the searches on the client side:
---
layout: none
---
[
]
SimpleJekyllSearch needs two DOM elements to work:
Here is the code you can use with the default configuration:
You need to place the following code within the layout where you want the search to appear. (See the configuration section below to customize it)
For example in _layouts/default.html:
<!-- HTML elements for search -->
<input type="text" id="search-input" placeholder="Search blog posts..">
<ul id="results-container"></ul>
<!-- or without installing anything -->
<script src="https://unpkg.com/simple-jekyll-search@latest/dest/simple-jekyll-search.min.js"></script>
Customize SimpleJekyllSearch by passing in your configuration options:
var sjs = SimpleJekyllSearch({
searchInput: document.getElementById('search-input'),
resultsContainer: document.getElementById('results-container'),
json: '/search.json'
})
A new instance of SimpleJekyllSearch returns an object, with the only property search.
search is a function used to simulate a user input and display the matching results.
E.g.:
var sjs = SimpleJekyllSearch({ ...options })
sjs.search('Hello')
💡 it can be used to filter posts by tags or categories!
Here is a list of the available options, usage questions, troubleshooting & guides.
The input element on which the plugin should listen for keyboard event and trigger the searching and rendering for articles.
The container element in which the search results should be rendered in. Typically a <ul>.
You can either pass in an URL to the search.json file, or the results in form of JSON directly, to save one round trip to get the data.
The template of a single rendered search result.
The templating syntax is very simple: You just enclose the properties you want to replace with curly braces.
E.g.
The template
var sjs = SimpleJekyllSearch({
searchInput: document.getElementById('search-input'),
resultsContainer: document.getElementById('results-container'),
json: '/search.json',
searchResultTemplate: '<li><a href="{url}">{title}</a></li>'
})
will render to the following
<li><a href="/jekyll/update/2014/11/01/welcome-to-jekyll.html">Welcome to Jekyll!</a></li>
If the search.json contains this data
[
{
"title" : "Welcome to Jekyll!",
"category" : "",
"tags" : "",
"url" : "/jekyll/update/2014/11/01/welcome-to-jekyll.html",
"date" : "2014-11-01 21:07:22 +0100"
}
]
A function that will be called whenever a match in the template is found.
It gets passed the current property name, property value, and the template.
If the function returns a non-undefined value, it gets replaced in the template.
This can be potentially useful for manipulating URLs etc.
Example:
SimpleJekyllSearch({
...
templateMiddleware: function(prop, value, template) {
if (prop === 'bar') {
return value.replace(/^\//, '')
}
}
...
})
See the tests for an in-depth code example
A function that will be used to sort the filtered results.
It can be used for example to group the sections together.
Example:
SimpleJekyllSearch({
...
sortMiddleware: function(a, b) {
var astr = String(a.section) + "-" + String(a.caption);
var bstr = String(b.section) + "-" + String(b.caption);
return astr.localeCompare(bstr)
}
...
})
The HTML that will be shown if the query didn’t match anything.
You can limit the number of posts rendered on the page.
Enable fuzzy search to allow less restrictive matching.
Pass in a list of terms you want to exclude (terms will be matched against a regex, so URLs, words are allowed).
A function called once the data has been loaded.
Limit how many times the search function can be executed over the given time window. This is especially useful to improve the user experience when searching over a large dataset (either with rare terms or because the number of posts to display is large). If no debounceTime (milliseconds) is provided a search will be triggered on each keystroke.
remove_chars as a filter.For example: in search.json, replace
"content": "# [Simple-Jekyll-Search](https://www.npmjs.com/package/simple-jekyll-search)[](https://travis-ci.org/christian-fei/Simple-Jekyll-Search)[](https://david-dm.org/christian-fei/Simple-Jekyll-Search)[](https://david-dm.org/christian-fei/Simple-Jekyll-Search?type=dev)A JavaScript library to add search functionality to any Jekyll blog.## Use caseYou have a blog, built with Jekyll, and want a **lightweight search functionality** on your blog, purely client-side?*No server configurations or databases to maintain*.Just **5 minutes** to have a **fully working searchable blog**.---## Installation### npm```shnpm install simple-jekyll-search```## Getting started### Create `search.json`Place the following code in a file called `search.json` in the **root** of your Jekyll blog. (You can also get a copy [from here](/example/search.json))This file will be used as a small data source to perform the searches on the client side:```yaml---layout: none---[ {% for post in site.posts %} { "title" : "{{ post.title | escape }}", "category" : "{{ post.category }}", "tags" : "{{ post.tags | join: ', ' }}", "url" : "{{ site.baseurl }}{{ post.url }}", "date" : "{{ post.date }}" } {% unless forloop.last %},{% endunless %} {% endfor %}]```## Preparing the plugin### Add DOM elementsSimpleJekyllSearch needs two `DOM` elements to work:- a search input field- a result container to display the results#### Give me the codeHere is the code you can use with the default configuration:You need to place the following code within the layout where you want the search to appear. (See the configuration section below to customize it)For example in **_layouts/default.html**:```html```## UsageCustomize SimpleJekyllSearch by passing in your configuration options:```jsvar sjs = SimpleJekyllSearch({ searchInput: document.getElementById('search-input'), resultsContainer: document.getElementById('results-container'), json: '/search.json'})```### returns { search }A new instance of SimpleJekyllSearch returns an object, with the only property `search`.`search` is a function used to simulate a user input and display the matching results. E.g.:```jsvar sjs = SimpleJekyllSearch({ ...options })sjs.search('Hello')```💡 it can be used to filter posts by tags or categories!## OptionsHere is a list of the available options, usage questions, troubleshooting & guides.### searchInput (Element) [required]The input element on which the plugin should listen for keyboard event and trigger the searching and rendering for articles.### resultsContainer (Element) [required]The container element in which the search results should be rendered in. Typically a ``.### json (String|JSON) [required]You can either pass in an URL to the `search.json` file, or the results in form of JSON directly, to save one round trip to get the data.### searchResultTemplate (String) [optional]The template of a single rendered search result.The templating syntax is very simple: You just enclose the properties you want to replace with curly braces.E.g.The template```jsvar sjs = SimpleJekyllSearch({ searchInput: document.getElementById('search-input'), resultsContainer: document.getElementById('results-container'), json: '/search.json', searchResultTemplate: '{title}'})```will render to the following```htmlWelcome to Jekyll!```If the `search.json` contains this data```json[ { "title" : "Welcome to Jekyll!", "category" : "", "tags" : "", "url" : "/jekyll/update/2014/11/01/welcome-to-jekyll.html", "date" : "2014-11-01 21:07:22 +0100" }]```### templateMiddleware (Function) [optional]A function that will be called whenever a match in the template is found.It gets passed the current property name, property value, and the template.If the function returns a non-undefined value, it gets replaced in the template.This can be potentially useful for manipulating URLs etc.Example:```jsSimpleJekyllSearch({ ... templateMiddleware: function(prop, value, template) { if (prop === 'bar') { return value.replace(/^\//, '') } } ...})```See the [tests](https://github.com/christian-fei/Simple-Jekyll-Search/blob/master/tests/Templater.test.js) for an in-depth code example### sortMiddleware (Function) [optional]A function that will be used to sort the filtered results.It can be used for example to group the sections together.Example:```jsSimpleJekyllSearch({ ... sortMiddleware: function(a, b) { var astr = String(a.section) + "-" + String(a.caption); var bstr = String(b.section) + "-" + String(b.caption); return astr.localeCompare(bstr) } ...})```### noResultsText (String) [optional]The HTML that will be shown if the query didn't match anything.### limit (Number) [optional]You can limit the number of posts rendered on the page.### fuzzy (Boolean) [optional]Enable fuzzy search to allow less restrictive matching.### exclude (Array) [optional]Pass in a list of terms you want to exclude (terms will be matched against a regex, so URLs, words are allowed).### success (Function) [optional]A function called once the data has been loaded.### debounceTime (Number) [optional]Limit how many times the search function can be executed over the given time window. This is especially useful to improve the user experience when searching over a large dataset (either with rare terms or because the number of posts to display is large). If no `debounceTime` (milliseconds) is provided a search will be triggered on each keystroke.---## If search isn't working due to invalid JSON- There is a filter plugin in the _plugins folder which should remove most characters that cause invalid JSON. To use it, add the simple_search_filter.rb file to your _plugins folder, and use `remove_chars` as a filter.For example: in search.json, replace```json"content": "{{ page.content | strip_html | strip_newlines }}"```with```json"content": "{{ page.content | strip_html | strip_newlines | remove_chars | escape }}"```If this doesn't work when using Github pages you can try `jsonify` to make sure the content is json compatible:```js"content": {{ page.content | jsonify }}```**Note: you don't need to use quotes `"` in this since `jsonify` automatically inserts them.**## Enabling full-text searchReplace `search.json` with the following code:```yaml---layout: none---[ {% for post in site.posts %} { "title" : "{{ post.title | escape }}", "category" : "{{ post.category }}", "tags" : "{{ post.tags | join: ', ' }}", "url" : "{{ site.baseurl }}{{ post.url }}", "date" : "{{ post.date }}", "content" : "{{ post.content | strip_html | strip_newlines }}" } {% unless forloop.last %},{% endunless %} {% endfor %} , {% for page in site.pages %} { {% if page.title != nil %} "title" : "{{ page.title | escape }}", "category" : "{{ page.category }}", "tags" : "{{ page.tags | join: ', ' }}", "url" : "{{ site.baseurl }}{{ page.url }}", "date" : "{{ page.date }}", "content" : "{{ page.content | strip_html | strip_newlines }}" {% endif %} } {% unless forloop.last %},{% endunless %} {% endfor %}]```## Development- `npm install`- `npm test`#### Acceptance tests```bashcd example; jekyll serve# in another tabnpm run cypress -- run```## ContributorsThanks to all [contributors](https://github.com/christian-fei/Simple-Jekyll-Search/graphs/contributors) over the years! You are the best :)> [@daviddarnes](https://github.com/daviddarnes)[@XhmikosR](https://github.com/XhmikosR)[@PeterDaveHello](https://github.com/PeterDaveHello)[@mikeybeck](https://github.com/mikeybeck)[@egladman](https://github.com/egladman)[@midzer](https://github.com/midzer)[@eduardoboucas](https://github.com/eduardoboucas)[@kremalicious](https://github.com/kremalicious)[@tibotiber](https://github.com/tibotiber)and many others!## Stargazers over time[](https://starchart.cc/christian-fei/Simple-Jekyll-Search)"
with
"content": "# [Simple-Jekyll-Search](https://www.npmjs.com/package/simple-jekyll-search)[](https://travis-ci.org/christian-fei/Simple-Jekyll-Search)[](https://david-dm.org/christian-fei/Simple-Jekyll-Search)[](https://david-dm.org/christian-fei/Simple-Jekyll-Search?type=dev)A JavaScript library to add search functionality to any Jekyll blog.## Use caseYou have a blog, built with Jekyll, and want a **lightweight search functionality** on your blog, purely client-side?*No server configurations or databases to maintain*.Just **5 minutes** to have a **fully working searchable blog**.---## Installation### npm```shnpm install simple-jekyll-search```## Getting started### Create `search.json`Place the following code in a file called `search.json` in the **root** of your Jekyll blog. (You can also get a copy [from here](/example/search.json))This file will be used as a small data source to perform the searches on the client side:```yaml---layout: none---[ {% for post in site.posts %} { "title" : "{{ post.title | escape }}", "category" : "{{ post.category }}", "tags" : "{{ post.tags | join: ', ' }}", "url" : "{{ site.baseurl }}{{ post.url }}", "date" : "{{ post.date }}" } {% unless forloop.last %},{% endunless %} {% endfor %}]```## Preparing the plugin### Add DOM elementsSimpleJekyllSearch needs two `DOM` elements to work:- a search input field- a result container to display the results#### Give me the codeHere is the code you can use with the default configuration:You need to place the following code within the layout where you want the search to appear. (See the configuration section below to customize it)For example in **_layouts/default.html**:```html```## UsageCustomize SimpleJekyllSearch by passing in your configuration options:```jsvar sjs = SimpleJekyllSearch({ searchInput: document.getElementById('search-input'), resultsContainer: document.getElementById('results-container'), json: '/search.json'})```### returns { search }A new instance of SimpleJekyllSearch returns an object, with the only property `search`.`search` is a function used to simulate a user input and display the matching results. E.g.:```jsvar sjs = SimpleJekyllSearch({ ...options })sjs.search('Hello')```💡 it can be used to filter posts by tags or categories!## OptionsHere is a list of the available options, usage questions, troubleshooting & guides.### searchInput (Element) [required]The input element on which the plugin should listen for keyboard event and trigger the searching and rendering for articles.### resultsContainer (Element) [required]The container element in which the search results should be rendered in. Typically a ``.### json (String|JSON) [required]You can either pass in an URL to the `search.json` file, or the results in form of JSON directly, to save one round trip to get the data.### searchResultTemplate (String) [optional]The template of a single rendered search result.The templating syntax is very simple: You just enclose the properties you want to replace with curly braces.E.g.The template```jsvar sjs = SimpleJekyllSearch({ searchInput: document.getElementById('search-input'), resultsContainer: document.getElementById('results-container'), json: '/search.json', searchResultTemplate: '{title}'})```will render to the following```htmlWelcome to Jekyll!```If the `search.json` contains this data```json[ { "title" : "Welcome to Jekyll!", "category" : "", "tags" : "", "url" : "/jekyll/update/2014/11/01/welcome-to-jekyll.html", "date" : "2014-11-01 21:07:22 +0100" }]```### templateMiddleware (Function) [optional]A function that will be called whenever a match in the template is found.It gets passed the current property name, property value, and the template.If the function returns a non-undefined value, it gets replaced in the template.This can be potentially useful for manipulating URLs etc.Example:```jsSimpleJekyllSearch({ ... templateMiddleware: function(prop, value, template) { if (prop === 'bar') { return value.replace(/^\//, '') } } ...})```See the [tests](https://github.com/christian-fei/Simple-Jekyll-Search/blob/master/tests/Templater.test.js) for an in-depth code example### sortMiddleware (Function) [optional]A function that will be used to sort the filtered results.It can be used for example to group the sections together.Example:```jsSimpleJekyllSearch({ ... sortMiddleware: function(a, b) { var astr = String(a.section) + "-" + String(a.caption); var bstr = String(b.section) + "-" + String(b.caption); return astr.localeCompare(bstr) } ...})```### noResultsText (String) [optional]The HTML that will be shown if the query didn't match anything.### limit (Number) [optional]You can limit the number of posts rendered on the page.### fuzzy (Boolean) [optional]Enable fuzzy search to allow less restrictive matching.### exclude (Array) [optional]Pass in a list of terms you want to exclude (terms will be matched against a regex, so URLs, words are allowed).### success (Function) [optional]A function called once the data has been loaded.### debounceTime (Number) [optional]Limit how many times the search function can be executed over the given time window. This is especially useful to improve the user experience when searching over a large dataset (either with rare terms or because the number of posts to display is large). If no `debounceTime` (milliseconds) is provided a search will be triggered on each keystroke.---## If search isn't working due to invalid JSON- There is a filter plugin in the _plugins folder which should remove most characters that cause invalid JSON. To use it, add the simple_search_filter.rb file to your _plugins folder, and use `remove_chars` as a filter.For example: in search.json, replace```json"content": "{{ page.content | strip_html | strip_newlines }}"```with```json"content": "{{ page.content | strip_html | strip_newlines | remove_chars | escape }}"```If this doesn't work when using Github pages you can try `jsonify` to make sure the content is json compatible:```js"content": {{ page.content | jsonify }}```**Note: you don't need to use quotes `"` in this since `jsonify` automatically inserts them.**## Enabling full-text searchReplace `search.json` with the following code:```yaml---layout: none---[ {% for post in site.posts %} { "title" : "{{ post.title | escape }}", "category" : "{{ post.category }}", "tags" : "{{ post.tags | join: ', ' }}", "url" : "{{ site.baseurl }}{{ post.url }}", "date" : "{{ post.date }}", "content" : "{{ post.content | strip_html | strip_newlines }}" } {% unless forloop.last %},{% endunless %} {% endfor %} , {% for page in site.pages %} { {% if page.title != nil %} "title" : "{{ page.title | escape }}", "category" : "{{ page.category }}", "tags" : "{{ page.tags | join: ', ' }}", "url" : "{{ site.baseurl }}{{ page.url }}", "date" : "{{ page.date }}", "content" : "{{ page.content | strip_html | strip_newlines }}" {% endif %} } {% unless forloop.last %},{% endunless %} {% endfor %}]```## Development- `npm install`- `npm test`#### Acceptance tests```bashcd example; jekyll serve# in another tabnpm run cypress -- run```## ContributorsThanks to all [contributors](https://github.com/christian-fei/Simple-Jekyll-Search/graphs/contributors) over the years! You are the best :)> [@daviddarnes](https://github.com/daviddarnes)[@XhmikosR](https://github.com/XhmikosR)[@PeterDaveHello](https://github.com/PeterDaveHello)[@mikeybeck](https://github.com/mikeybeck)[@egladman](https://github.com/egladman)[@midzer](https://github.com/midzer)[@eduardoboucas](https://github.com/eduardoboucas)[@kremalicious](https://github.com/kremalicious)[@tibotiber](https://github.com/tibotiber)and many others!## Stargazers over time[](https://starchart.cc/christian-fei/Simple-Jekyll-Search)"
If this doesn’t work when using Github pages you can try jsonify to make sure the content is json compatible:
"content": "# [Simple-Jekyll-Search](https://www.npmjs.com/package/simple-jekyll-search)\n\n[](https://travis-ci.org/christian-fei/Simple-Jekyll-Search)\n[](https://david-dm.org/christian-fei/Simple-Jekyll-Search)\n[](https://david-dm.org/christian-fei/Simple-Jekyll-Search?type=dev)\n\nA JavaScript library to add search functionality to any Jekyll blog.\n\n## Use case\n\nYou have a blog, built with Jekyll, and want a **lightweight search functionality** on your blog, purely client-side?\n\n*No server configurations or databases to maintain*.\n\nJust **5 minutes** to have a **fully working searchable blog**.\n\n---\n\n## Installation\n\n### npm\n\n```sh\nnpm install simple-jekyll-search\n```\n\n## Getting started\n\n### Create `search.json`\n\nPlace the following code in a file called `search.json` in the **root** of your Jekyll blog. (You can also get a copy [from here](/example/search.json))\n\nThis file will be used as a small data source to perform the searches on the client side:\n\n```yaml\n---\nlayout: none\n---\n[\n {% for post in site.posts %}\n {\n \"title\" : \"{{ post.title | escape }}\",\n \"category\" : \"{{ post.category }}\",\n \"tags\" : \"{{ post.tags | join: ', ' }}\",\n \"url\" : \"{{ site.baseurl }}{{ post.url }}\",\n \"date\" : \"{{ post.date }}\"\n } {% unless forloop.last %},{% endunless %}\n {% endfor %}\n]\n```\n\n\n## Preparing the plugin\n\n### Add DOM elements\n\nSimpleJekyllSearch needs two `DOM` elements to work:\n\n- a search input field\n- a result container to display the results\n\n#### Give me the code\n\nHere is the code you can use with the default configuration:\n\nYou need to place the following code within the layout where you want the search to appear. (See the configuration section below to customize it)\n\nFor example in **_layouts/default.html**:\n\n```html\n<!-- HTML elements for search -->\n<input type=\"text\" id=\"search-input\" placeholder=\"Search blog posts..\">\n<ul id=\"results-container\"></ul>\n\n<!-- or without installing anything -->\n<script src=\"https://unpkg.com/simple-jekyll-search@latest/dest/simple-jekyll-search.min.js\"></script>\n```\n\n\n## Usage\n\nCustomize SimpleJekyllSearch by passing in your configuration options:\n\n```js\nvar sjs = SimpleJekyllSearch({\n searchInput: document.getElementById('search-input'),\n resultsContainer: document.getElementById('results-container'),\n json: '/search.json'\n})\n```\n\n### returns { search }\n\nA new instance of SimpleJekyllSearch returns an object, with the only property `search`.\n\n`search` is a function used to simulate a user input and display the matching results. \n\nE.g.:\n\n```js\nvar sjs = SimpleJekyllSearch({ ...options })\nsjs.search('Hello')\n```\n\n💡 it can be used to filter posts by tags or categories!\n\n## Options\n\nHere is a list of the available options, usage questions, troubleshooting & guides.\n\n### searchInput (Element) [required]\n\nThe input element on which the plugin should listen for keyboard event and trigger the searching and rendering for articles.\n\n\n### resultsContainer (Element) [required]\n\nThe container element in which the search results should be rendered in. Typically a `<ul>`.\n\n\n### json (String|JSON) [required]\n\nYou can either pass in an URL to the `search.json` file, or the results in form of JSON directly, to save one round trip to get the data.\n\n\n### searchResultTemplate (String) [optional]\n\nThe template of a single rendered search result.\n\nThe templating syntax is very simple: You just enclose the properties you want to replace with curly braces.\n\nE.g.\n\nThe template\n\n```js\nvar sjs = SimpleJekyllSearch({\n searchInput: document.getElementById('search-input'),\n resultsContainer: document.getElementById('results-container'),\n json: '/search.json',\n searchResultTemplate: '<li><a href=\"{{ site.url }}{url}\">{title}</a></li>'\n})\n```\n\nwill render to the following\n\n```html\n<li><a href=\"/jekyll/update/2014/11/01/welcome-to-jekyll.html\">Welcome to Jekyll!</a></li>\n```\n\nIf the `search.json` contains this data\n\n```json\n[\n {\n \"title\" : \"Welcome to Jekyll!\",\n \"category\" : \"\",\n \"tags\" : \"\",\n \"url\" : \"/jekyll/update/2014/11/01/welcome-to-jekyll.html\",\n \"date\" : \"2014-11-01 21:07:22 +0100\"\n }\n]\n```\n\n\n### templateMiddleware (Function) [optional]\n\nA function that will be called whenever a match in the template is found.\n\nIt gets passed the current property name, property value, and the template.\n\nIf the function returns a non-undefined value, it gets replaced in the template.\n\nThis can be potentially useful for manipulating URLs etc.\n\nExample:\n\n```js\nSimpleJekyllSearch({\n ...\n templateMiddleware: function(prop, value, template) {\n if (prop === 'bar') {\n return value.replace(/^\\//, '')\n }\n }\n ...\n})\n```\n\nSee the [tests](https://github.com/christian-fei/Simple-Jekyll-Search/blob/master/tests/Templater.test.js) for an in-depth code example\n\n### sortMiddleware (Function) [optional]\n\nA function that will be used to sort the filtered results.\n\nIt can be used for example to group the sections together.\n\nExample:\n\n```js\nSimpleJekyllSearch({\n ...\n sortMiddleware: function(a, b) {\n var astr = String(a.section) + \"-\" + String(a.caption);\n var bstr = String(b.section) + \"-\" + String(b.caption);\n return astr.localeCompare(bstr)\n }\n ...\n})\n```\n\n### noResultsText (String) [optional]\n\nThe HTML that will be shown if the query didn't match anything.\n\n\n### limit (Number) [optional]\n\nYou can limit the number of posts rendered on the page.\n\n\n### fuzzy (Boolean) [optional]\n\nEnable fuzzy search to allow less restrictive matching.\n\n### exclude (Array) [optional]\n\nPass in a list of terms you want to exclude (terms will be matched against a regex, so URLs, words are allowed).\n\n### success (Function) [optional]\n\nA function called once the data has been loaded.\n\n### debounceTime (Number) [optional]\n\nLimit how many times the search function can be executed over the given time window. This is especially useful to improve the user experience when searching over a large dataset (either with rare terms or because the number of posts to display is large). If no `debounceTime` (milliseconds) is provided a search will be triggered on each keystroke.\n\n---\n\n## If search isn't working due to invalid JSON\n\n- There is a filter plugin in the _plugins folder which should remove most characters that cause invalid JSON. To use it, add the simple_search_filter.rb file to your _plugins folder, and use `remove_chars` as a filter.\n\nFor example: in search.json, replace\n\n```json\n\"content\": \"{{ page.content | strip_html | strip_newlines }}\"\n```\n\nwith\n\n```json\n\"content\": \"{{ page.content | strip_html | strip_newlines | remove_chars | escape }}\"\n```\n\nIf this doesn't work when using Github pages you can try `jsonify` to make sure the content is json compatible:\n\n```js\n\"content\": {{ page.content | jsonify }}\n```\n\n**Note: you don't need to use quotes `\"` in this since `jsonify` automatically inserts them.**\n\n\n## Enabling full-text search\n\nReplace `search.json` with the following code:\n\n```yaml\n---\nlayout: none\n---\n[\n {% for post in site.posts %}\n {\n \"title\" : \"{{ post.title | escape }}\",\n \"category\" : \"{{ post.category }}\",\n \"tags\" : \"{{ post.tags | join: ', ' }}\",\n \"url\" : \"{{ site.baseurl }}{{ post.url }}\",\n \"date\" : \"{{ post.date }}\",\n \"content\" : \"{{ post.content | strip_html | strip_newlines }}\"\n } {% unless forloop.last %},{% endunless %}\n {% endfor %}\n ,\n {% for page in site.pages %}\n {\n {% if page.title != nil %}\n \"title\" : \"{{ page.title | escape }}\",\n \"category\" : \"{{ page.category }}\",\n \"tags\" : \"{{ page.tags | join: ', ' }}\",\n \"url\" : \"{{ site.baseurl }}{{ page.url }}\",\n \"date\" : \"{{ page.date }}\",\n \"content\" : \"{{ page.content | strip_html | strip_newlines }}\"\n {% endif %}\n } {% unless forloop.last %},{% endunless %}\n {% endfor %}\n]\n```\n\n\n\n## Development\n\n- `npm install`\n- `npm test`\n\n#### Acceptance tests\n\n```bash\ncd example; jekyll serve\n\n# in another tab\n\nnpm run cypress -- run\n```\n\n## Contributors\n\nThanks to all [contributors](https://github.com/christian-fei/Simple-Jekyll-Search/graphs/contributors) over the years! You are the best :)\n\n> [@daviddarnes](https://github.com/daviddarnes)\n[@XhmikosR](https://github.com/XhmikosR)\n[@PeterDaveHello](https://github.com/PeterDaveHello)\n[@mikeybeck](https://github.com/mikeybeck)\n[@egladman](https://github.com/egladman)\n[@midzer](https://github.com/midzer)\n[@eduardoboucas](https://github.com/eduardoboucas)\n[@kremalicious](https://github.com/kremalicious)\n[@tibotiber](https://github.com/tibotiber)\nand many others!\n\n## Stargazers over time\n\n[](https://starchart.cc/christian-fei/Simple-Jekyll-Search)\n"
Note: you don’t need to use quotes " in this since jsonify automatically inserts them.
Replace search.json with the following code:
---
layout: none
---
[
,
{
"title" : "Page Not Found",
"category" : "",
"tags" : "",
"url" : "/404.html",
"date" : "",
"content" : "Sorry, but the page you were trying to view does not exist — please consult our course contents."
} ,
{
"title" : "Quantifying the Interaction Energy Between the SARS-CoV-2 Spike Protein and ACE2",
"category" : "",
"tags" : "",
"url" : "/coronavirus/NAMD",
"date" : "",
"content" : "Energy functions measure quality of protein bindingIn the previous lesson, we visualized the RBD-ACE2 complexes formed by SARS-CoV and SARS-CoV-2 and examined three regions of conformational differences (a loop site in the ACE2 binding ridge, hotspot 31, and hotspot 353). We presented qualitative explanations from the literature as to why these differences may help SARS-CoV-2 bind more strongly to the human ACE2 enzyme, but a theme of this course is to justify our arguments quantitatively. Our question, then, is how to measure the strength of protein binding in a local region of the complex.In part 1 of this module, we searched for the tertiary structure that best “explains” a protein’s primary structure by looking for the structure with the lowest potential energy (i.e., the one that is the most chemically stable). In part, this means that we were looking for the structure that incorporates many attractive bonds present and few repulsive bonds.To quantify whether two molecules bond well, we will borrow this idea and compute the potential energy of the complex formed by the two molecules. If two molecules bond together tightly, then the complex will have a very low potential energy. In turn, if we compare the SARS-CoV-2 RBD-ACE2 complex against the SARS-CoV RBD-ACE2 complex, and we find that the potential energy of the former is significantly smaller, then we can conclude that it is more stable and therefore bonded more tightly. This result would provide strong evidence for increased infectiousness of SARS-CoV-2.In the following tutorial, we will compute the energy of the bound spike protein-ACE2 complex for the two viruses and see how the three regions we identified in the previous lesson contribute to the total energy of the complex. To do so, we will employ NAMD, a program that was designed for high-performance large system simulations of biological molecules and is most commonly used with VMD via a plugin called NAMD Energy. This plugin will allow us to isolate a specific region to evaluate how much this local region contributes to the overall energy of the complex.Visit tutorialDifferences in interaction energy with ACE2 between SARS and SARS-CoV-2Using the methods described in the tutorial, we calculated the interaction energies for each of the three regions of interest as well as for the total energy of the complexes for both SARS-CoV and SARS-CoV-2. The results of this analysis are shown in the table below.ACE2 interaction energies of the chimeric SARS-CoV-2 RBD and SARS RBD. The PDB files contain two biological assemblies, or instances, of the corresponding structure. The first instance includes chain A (ACE2) and chain E (RBD), and the second instance includes chain B (ACE2) and chain F (RBD). The overall interactive energies between the RBD and ACE2 are shown in the first two rows (green). Then, the individual interaction energies are shown from the loop site (yellow), hotspot 31 (red), and hotspot 353 (grey). Total energy is computed as the sum of electrostatic interactions and van der Waals (vdW) forces.We can see in the table that the overall attractive interaction energy between the RBD and ACE2 is lower for SARS-CoV-2 than for SARS-CoV, which supports previous studies that have found the SARS-CoV-2 spike protein to have higher affinity with ACE2.Furthermore, all of the three regions of interest have a lower total energy in SARS-CoV-2 than in SARS-CoV, with hotspot 31 (red) having the greatest negative contribution. We now have quantitative evidence that the conformational changes in the three sites do indeed increase the binding affinity between the spike protein and ACE2.Nevertheless, we should be careful with making strong inferences of the infectiousness of SARS-CoV-2 based on these results. To add evidence for our case, we would need biologists to perform additional experimental work to demonstrate that the improved binding of SARS-CoV-2 translates into greater infectiousness in human cells.Another the reason for our cautiousness is that proteins are not fixed objects but rather dynamic structures whose shape is subject to small changes over time. In the conclusion to part 2 of this module, we will learn how to analyze the dynamics of a protein’s movements within its environment.Next lesson"
} ,
{
"title" : "NMA Calculations",
"category" : "",
"tags" : "",
"url" : "/coronavirus/NMA",
"date" : "",
"content" : "NMA IntroductionProteins are not static, but rather dynamic structures. These fluctuations in their structures are typically key components their functions. Molecular dynamics (MD) is all about simulating molecules to analyze the movement of the molecules, atoms, and their interactions. However, simulating large structures, such as proteins with hundreds of amino acids, can prove to be extremely computationally heavy. Fortunately, there is an alternative method of studying large-scale movements of these structures called Normal mode analysis (NMA). NMA of proteins is based on the theory that the lowest frequency vibrational normal modes are the most functionally relavent, describing the largest movement within the protein 1.One of the approaches for modeling a molecule is to represent atoms as nodes that are interconnected with springs, otherwise known as an elastic network model (ENM). The motivation of using ENM is that bonds actually share many characteristics with springs. We stated that proteins are not static, but this is true because the bonds that everything together are not static either. Bonds are constantly vibrating, stretching and compressing much like that of a oscillating spring-mass system show below. via GfycatThe bonded atoms are held together by sharing electrons, but is held at specific bond length due to the attraction and repulsion forces of the negatively charged electrons and positively charged nucleus. Just like a spring, when you bring the atoms closer together then the normal (equilibrium), they will resist with greater and greater repulsion force. A popular method for performing NMA is the Gaussian network model (GNM), the ENM for isotropic fluctuations. Isotropic describes physical properties don’t change with direction, meaning that GNM analyzes only the size of the fluctuation in the protein.Besides root-mean-square deviation (RMSD), we can compare protein structures by comparing how the protein fluctuates. Two proteins fluctuate differently is typically a clear indication that the internal structure is different. Therefore, we can perform NMA calculations as another approach to comparing SARS-CoV-2 and SARS S protein.One of the main strengths of ProDy is its capabilities for protein dynamics analysis. This includes performing NMA and visualizing the results that provide information on how the protein fluctuates. In this tutorial, we will use ProDy to perform GNM calculations on one chain of the SARS-CoV-2 S protein from the PDB entry 6vxx and visualize the results into various graphs and plots.Visit tutorialGNM Calculation of the Spike ProteinIn the tutorial, we generated four visualizations of how the SARS-CoV-2 S protein fluctuates. Using ProDy, we performed GNM Calculations on the SARS S protein using the PDB entry(5xlr). In addition, we also performed the calculations on a single chain of the S protein for a more thorough comparison. Here, we will explain how to interpret the results and compare them to analyze the differences and similarities between the two proteins.Contact MapA protein contact map is a 2D matrix that represents the distance between all amino acid residues in the protein. In other words, it is essentially a reduced, 2D representation of a protein’s tertiary structure. Contact map is another popular method of protein structure comparison. Proteins with very similar structures will have very similar contact map patterns, and deviations within the structure can be easily inferred by seeing unique patterns in only one of the proteins. Between all pairs of amino acids, the pair is assigned the value of 1 if the two residues are closer together than a predetermined threshold distance, and 0 otherwise. The threshold for the maps below is 20 Å, meaning that amino acid pairs within 20 Å of each other are assigned the value of 1. From these maps, we see very little differences between SARS-CoV-2 and SARS S proteins, meaning that they are structurally similar.This figure shows the contact maps of the SARS-CoV-2 S protein (top-left), SARS S protein (top-right), single-chain of the SARS-CoV-2 S protein (bottom-left), and single-chain of the SARS S protein (bottom-right). The map shows every amino acid residue pair in the structure. If the distance between the residue pair is 20.0 Å or less, then a value of 1.0 is assigned and shown in the color black. We see that SARS-CoV-2 and SARS S proteins have similar maps, indicating similar structures.Cross-CorrelationProtein residue cross-correlation shows the correlation between the fluctuations/displacement of residues. This graphical representation shows how the residues will move relative to each other. The pair is assigned the value of 1 if the fluctuations are completely correlated (move in the same direction), the value of -1 if the fluctuations are completely anticorrelated (move in opposite directions), and a value of 0 if uncorrelated (movements do not affect each other). It is typical to see a diagonal of strong cross-correlation because movements in the residue will almost always affect its direct neighbors. Positive correlations coming off the diagonal represents correlations between contiguous residues and are characteristics of secondary structures because residues in secondary structures tend to move together. Common patterns for secondary structures are triangular structures for helices and plume structures for strands. Off-diagonal correlation and anticorrellations may potential represent interesting interactions between non-contiguous residues and domains. From our results, we see that the SARS-CoV-2 and SARS S protein fluctuate similarly, supporting that they are similar structures.This figure shows the cross-correlation heat maps of the SARS-CoV-2 S protein (top-left), SARS S protein (top-right), single-chain of the SARS-CoV-2 S protein (bottom-left), and single-chain of the SARS S protein (bottom-right). The x-axis and y-axis represent the amino acid residues. The map shows every residue pair in the structure and the colors represent the correlation in the fluctuations of residues. A value of 1.0 (red) means that the residues will fluctuate together in the same direction. A value of -1.0 (dark blue) means that the residues will fluctuate together in opposite directions. A value of 0.0 means no relations between the fluctuations of the residues. We see that SARS-CoV-2 and SARS S proteins have very similar maps.Slow Mode ShapeNMA is based on the idea that the lowest frequency modes describe the largest movement in the structure. Below is the plot of the lowest frequency (slowest) mode calculated by ProDy. Here, the fluctuations are in arbitrary or relative units, but can interpreted as greater amplitudes represent regions of greater fluctuations. The sign of the value represents relative direction of the fluctuation, meaning that the plots can be flipped when comparing between different proteins. In the SARS-CoV-2 Chain A figure, we can see that the protein region between residues 200 and 500 is the most mobile. This region overlaps with where the RBD is located on the chain, between residues 331 to 524. This is important because it indicates the RBD being a mobile part of the S protein. Based on our results, we see that both S proteins have the same regions of great fluctuations, supporting that they have similar structures.This figure shows the slow mode plots of the SARS-CoV-2 S protein (top-left), SARS S protein (top-right), single-chain of the SARS-CoV-2 S protein (bottom-left), and single-chain of the SARS S protein (bottom-right). The x-axis represent the amino acid residues and the y-axis represents the fluctuations in relative units. From the single-chain plots for both SARS-CoV-2 and SARS, we see that the residues between 200 – 500 fluctuate the most. The plots between SARS-CoV-2 and SARS are very similar, indicating similar protein fluctuations.Square FluctuationThe slow mode square-fluctuation is calculated by multiplying the square of the slow mode with the variance along the mode. In this case, all the values will be positive, but the interpretation remains the same as the slow mode plot, where greater amplitudes represent regions of greater fluctuations and motions.This figure shows the plots of the slow mode square fluctuation of the SARS-CoV-2 S protein (top-left), SARS S protein (top-right), single-chain of the SARS-CoV-2 S protein (bottom-left), and single-chain of the SARS S protein (bottom-right). The x-axis represent the amino acid residues and the y-axis represents the fluctuations in relative units. The interpretation is the same as the slow mode plot, but with only positive values. The plots between SARS-CoV-2 and SARS are very similar, indicating similar protein fluctuations.Comparing ResultsFrom all four results, we see that SARS-CoV-2 and SARS S proteins are structurally very similar. This is, perhaps, not a surprise given that they are similar in sequence and have the same function of targeting ACE2.ANM Analysis of the RBDThe anisotropic counterpart to GNM, where direction does matter, is called anisotropic network model (ANM). In ANM, the direction of the fluctuations are also considered. Although ANM includes directionality, ANM typically performs worse than GNM when compared with experimental data 2. Nonetheless, ANM calculations are useful because of the added directionality. In fact, we can use it to create animations depicting the range of motions and fluctuations of the protein.In this tutorial, we will use NMWiz, a GUI for ProDy and is available as a plugin for VMD, to perform ANM calculations and create the animation of the SARS-CoV-2 (chimeric) RBD using the PDB entry 6vw1.Visit tutorialFrom the tutorial, we were able to generate the cross-correlation map and square fluctuation of the SARS-CoV-2 RBD. The interpretation of these results are identical to the GNM analysis above. Following the same steps, we performed ANM analysis on the SARS RBD using the PDB entry SARS RBD (2ajf for comparison.This figure shows the cross-correlation map (top) and the square fluctuation plot (bottom) of SARS-CoV-2 and SARS RBD using ANM. The y-axis of the square fluctuation plot represents how much the residues fluctuate in relative units. Like the results from the GNM analysis, the map and plot are very similar between the two RBDs, indicating that they are structurally similar.Perhaps unsurprisingly, the maps and plots show very small differences between SARS-CoV-2 and SARS RBD, just like in the GNM calculations for the S proteins. This indicates that the two RBDs are structurally similar.Using NMWiz and VMD, we also created animations of the protein fluctuation calculated through ANM analysis. The following animations are of the SARS-CoV-2 RBD/SARS RBD (purple) and ACE2 (green). Important residues from the three sites of conformational differences from the previous lessens are also colored.It is important to note that fluctuation calculated by ANM provides information on possible movement and flexibility, but does not depict actual protein movements.SARS-CoV-2 Spike Chimeric RBD (6vw1): SARS-CoV-2 (Chimeric) RBD Purple Resid 476 to 486 (Loop) Yellow Resid 455 (Hotspot 31 Blue Resid 493 (Hotspot 31 Orange Resid 501 (Hotspot 353) Red ACE2 Green Resid 79, 82, 83 (Loop) Silver Resid 31, 35 (Hotspot 31) Orange Resid 38, 353 (Hotspot 353) Red SARS Spike RBD (2ajf): SARS RBD Purple Resid 463 to 472 (Loop) Yellow Resid 442 (Hotspot 31 Orange Resid 487 (Hotspot 353 Red ACE2 Green Resid 79, 82, 83 (Loop) Silver Resid 31, 35 (Hotspot 31) Orange Resid 38, 353 (Hotspot 353) Red Using both the GNM and ANM approaches for normal mode analysis of SARS-CoV-2 S protein, we saw that it is structurally very similar to the SARS S protein. As we have stated in the Structural and ACE2 Interaction Differences and Interaction Energy with ACE2 lessons, the structural differences can be very subtle yet still contribute greatly with ACE2 binding affinity.Next lesson Skjaerven, L., Hollup, S., Reuter, N. 2009. Journal of Molecular Structure: THEOCHEM 898, 42-48. https://doi.org/10.1016/j.theochem.2008.09.024 ↩ Yang, L., Song, G., Jernigan, R. 2009. Protein elastic network models and the ranges of cooperativity. PNAS 106(30), 12347-12352. https://doi.org/10.1073/pnas.0902159106 ↩ "
} ,
{
"title" : "VMD Tutorial",
"category" : "",
"tags" : "",
"url" : "/coronavirus/VMDTutorial",
"date" : "",
"content" : "This is a short tutorial on how to use VMD to visualize molecules and perform some basic analysis. Before you start, make sure to have downloaded and installed VMD.Loading MoleculesThese steps will be on how to load molecules into VMD. We will use the example of 6vw1.Download the protein structure of 6vw1 from the protein data bank.Next we can launch VMD and load the molecule into the program. In VMD Main, navigate to File > New Molecule. Click Browse, select the molecule (6vw1.pdb) and click Load.The molecule should now be listed in VMD Main as well as the visualization in the OpenGL Display.Section to be movedGlycansFor VMD, there is no specific keyword to select glycans. A workaround is to use the keywords: “not protein and not water”. To recreate the basic VMD visualizations from the module of the open-state (6vyb) of SARS-CoV-2 Spike, use the following representations. (For the protein chains, use Glass3 for Material).The end result should look like this: Visualization Exercise Try to recreate the visualization of Hotspot31 for SARS-CoV-2 (same molecule as the tutorial). The important residues and their corresponding colors are listed on the left. "
} ,
{
"title" : "Ab initio Protein Structure Prediction",
"category" : "",
"tags" : "",
"url" : "/coronavirus/ab_initio",
"date" : "",
"content" : "Distributing the work of protein structure prediction around the worldThe determination of the SARS-CoV-2 spike protein’s structure was remarkable because in many senses it was a community effort, dividing the computational heavy lifting over thousands of volunteers’ computers around the world. Two leading structure prediction projects, Rosetta@home and Folding@home, encourage volunteers to download their software and contribute to a gigantic distributed effort to predict protein shape. Even with a modest laptop, a user can donate some of their computer’s idle resources to contribute to the problem of protein structure prediction. But how does this software work?Predicting a protein’s structure using only its amino acid sequence is called ab initio structure prediction (ab initio is from the Latin for “from the beginning”). In this lesson, we will explain a little about how ab initio structure prediction algorithms work.As we dive into structure prediction, we should be more precise about two things. First, we will specify what we mean by the “structure” of a protein. Second, although we know that a polypeptide always folds into the same final three-dimensional shape, we have not said anything about why a protein folds in a certain way. We will need a better understanding of how the physicochemical properties of amino acids affect a protein’s final structure.The four levels of protein structure“Protein structure” is a broad term that encapsulates four different levels of description. A protein’s primary structure refers to the amino acid sequence of the polypeptide chain. The primary structure of human hemoglobin subunit alpha can be downloaded here, and the primary structure of the SARS-CoV-2 spike protein can be downloaded here.A protein’s secondary structure describes its highly regular, repeating substructures that serve as intermediate structures forming before the overall protein structure comes together. The two most common such substructures, shown in the figure below, are alpha helices (left) and beta sheets (right). Alpha helices occur when nearby amino acids wrap around to form a tube-like structure; beta sheets occur when nearby amino acids line up side-by-side to form a sheet-like structure.General shape of secondary structure alpha helices (left) and beta sheets (right). Source: Cornell, B. (n.d.). https://ib.bioninja.com.au/higher-level/topic-7-nucleic-acids/73-translation/protein-structure.htmlA protein’s tertiary structure describes its final 3D shape after the polypeptide chain has folded and is stable. Throughout this module, when discussing the “shape” or “structure” of a protein, we are almost exclusively referring to its tertiary structure. The figure below shows the tertiary structure of human hemoglobin subunit alpha. Note that for the sake of simplicity, the figure does not show the positions of every atom in the protein but rather represents the protein shape as a composition of secondary structures.The tertiary structure of human hemoglobin subunit alpha. Within the structure are multiple alpha helix secondary structures. Source: https://www.rcsb.org/structure/1SI4.Finally, some proteins have a quaternary structure, which describes the protein’s interaction with other copies of itself to form a single functional unit, or a multimer. Many proteins do not have a quaternary structure and function as an independent monomer. The figure below shows the quaternary structure of hemoglobin, which is a multimer consisting of two alpha subunits and two beta subunits.The quaternary structure of human hemoglobin, which consists of two alpha subunits (shown in red) and two beta subunits (shown in blue). Source: https://commons.wikimedia.org/wiki/File:1GZX_Haemoglobin.png.As for SARS-CoV and SARS-CoV-2, the spike protein is a homotrimer, meaning that it is formed of three essentially identical units called chains, each one translated from the corresponding region of the coronavirus’s genome. When we talk about identifying the structure of the spike protein in this module, we typically are referring to the structure of a single chain.The structural units making up proteins are often hierarchical, and the spike protein is no exception. Each spike protein chain is a dimer, consisting of two subunits called S1 and S2. Each of these subunits further divides into protein domains, distinct structural units within the protein that fold independently and are typically responsible for a specific interaction or function. For example, the SARS-CoV-2 spike protein has a receptor binding domain (RBD) located on the S1 subunit that is responsible for interacting with the human ACE2 enzyme; the rest of the protein does not come into contact with ACE2. We will say more about the RBD soon.Proteins seek the lowest energy conformationNow that we know a bit more about how protein structure is defined, we will discuss why proteins fold in a certain way every time. In other words, what are the factors driving nature’s magic protein folding algorithm?Amino acids’ variety of side chains causes the amino acids to have different chemical properties, which can lead to different conformations being more chemically “preferable” than others. For example, the table below shows the twenty standard amino acids occurring in proteins grouped by chemical properties. Nine of these amino acids are hydrophobic (also called non-polar), meaning that their side chains tend to be repelled by water, and as a result we tend to find these amino acids sheltered from the environment on the interior of the protein.A chart of the twenty amino acid grouped by chemical properties. The side chain of each amino acid is highlighted in blue. Source: OpenStax Biology. http://cnx.org/contents/185cbf87-c72e-48f5-b51e-f14f21b5eabd@14.1.We can therefore view protein folding as finding the tertiary structure that is the most stable given a polypeptide’s primary structure. A central theme of the previous module on bacterial chemotaxis was that a system of chemical reactions moves toward equilibrium. The same principle is true of the magic folding algorithm; when a protein folds into its final structure, it is obtaining a conformation that minimum is as chemically stable as possible. The polypeptide starts as just a bonded chain of amino acids, but it ends as a folded protein, incorporating bonds and interactions between different parts of the protein along the way.To be more precise, the potential energy (sometimes called free energy) of a molecule is the energy stored within an object due to its position, state, and arrangement. In molecular mechanics, the potential energy is made up of the sum of bonded energy and non-bonded energy.Bonded energy derives from the protein’s covalent bonds, as well as the angles of bonds between adjacent amino acids, and the torsion angles that we saw in the previous lesson, as the protein bends and twists.Non-bonded energy comprises electrostatic interactions (Coulomb potential) and van der Waals interactions (Lennard-Jones potential). Electrostatic interactions refer to the attraction and repulsion force from the electric charge between pairs of atoms. For example, nonpolar amino acids are repelled by water, which is polar, meaning that its constituent atoms have electric charges. Because a water molecule’s oxygen atom has a negative charge and its hydrogen atoms have a positive charge, the molecule is not attracted to a nonpolar molecule, since opposite charges in the two molecules will not attract. As for van der Waals interactions, atoms are dynamic systems. The electrons are constantly buzzing around the nucleus, and at any given moment, they could be unevenly distributed on one side of the nucleus. Because electrons are negatively charged, the atom will have a temporary negative charge on the side with the excess electrons (and a temporary positive charge on the opposite side). These temporary charges are called induced dipoles, and van der Waals interactions refer to the attraction and repulsion between atoms because of these induced dipoles.An illustration of how induced dipoles and therefore van der Waals forces arise from random fluctuations in the positions of electrons. Source: http://universe-review.ca/F12-molecule12.htm.As the protein folds, it seeks a conformation of lowest total potential energy based on all these forces. For a simple analogy, imagine a ball on a slope, as shown in the following figure. Even if the ball bounces around, it will tend to move down the slope. In this analogy, the lower points on the slope correspond to lower energy conformations of a polypeptide.An analogy for a protein folding into a lowest energy structure is a ball on a hill. As the ball is more likely to move down into a valley, a protein is more likely to fold into a low-energy conformation.Modeling ab initio structure prediction as an exploration problemAlthough a host of different algorithms have been developed for ab initio protein structure through the years, these algorithms all find themselves solving a similar problem.Biochemical research has contributed to the development of scoring functions called force fields that compute the potential energy of a candidate protein shape. As a result, for a given choice of force field, we can think of ab initio structure prediction as solving the following problem: given a primary structure of a polypeptide, find its tertiary structure having minimum energy. This problem exemplifies an optimization problem, where we look for an object maximizing or minimizing some function subject to restraints.This formulation of protein structure may not strike you as similar to anything that we have done before in this course. However, consider a bacterium exploring an environment for food, as we did in the previous module on chemotaxis. Every point in the bacterium’s “search space” is characterized by a concentration of attractant at that point, and the bacterium’s goal is to reach the point of greatest attractant concentration.In the case of structure prediction, our search space is the collection of all possible conformations of a given protein. And each point in this search space is characterized by the energy of the conformation at the point. Just as we imagined a ball rolling down a hill to find lower energy, we can now imagine “exploring” this space of all conformations in order to find the conformation of lowest energy. This analogy is illustrated in the hypothetical figure below, in which the height of each point is the energy of the associated conformation; our goal, then, is to find the lowest point in this space.We can imagine each conformation of a given protein as occupying a point in a landscape, in which the elevation of a point corresponds to the energy of the conformation at that point. Courtesy: David Beamish.A local search algorithm for ab initio structure predictionNow that we have conceptualized finding the most stable protein structure as exploring a search space, our next question is how to develop an algorithm to explore this space. Continuing the analogy to chemotaxis, our idea is to adapt E. coli’s clever exploration algorithm from a previous lesson to our purposes. That is, at every step, we need to sense the “direction” in which the energy function decreases by the most, and then move in this direction.Adapting this exploration algorithm to protein structure prediction requires us to develop a notion of what it means to consider the points “near” a given conformation in a protein search space. Many ab initio algorithms will start at an arbitrary initial conformation and then make a variety of minor modifications to that structure (i.e., nearby points in the space), updating the current conformation to the modification that produces the greatest decrease in free energy. These algorithms then iterate the process of moving in the greatest of greatest energy decrease until we reach a conformation for which no nearby points reduce the free energy. Such an approach for structure prediction falls into a broad category of optimization algorithms called local search algorithms.Yet returning to the chemotaxis analogy, imagine what happens if we were to place many small sugar cubes and one large sugar cube into the bacterium’s environment. The bacterium will sense the gradient not of the large sugar cube but of its nearest attractant. As a result, because the smaller food sources outnumber the larger food source, the bacterium will likely not move to the point of greatest attractant concentration. In terms of bacterial exploration, this is a feature, not a bug; if the bacterium exhausts one food source, then it will just move to another. But in terms of protein structure prediction, we should be worried of winding up in such a local minimum, or a point of our search space for which no “neighboring” points have smaller score.STOP: Do you see any ways in which we could improve our local search approach for structure prediction?Fortunately, we can modify our local search algorithm in a variety of ways. First, because the initial conformation chosen has a huge influence on the final conformation that we return, we could run the algorithm multiple times with different starting conformations. This is analogous to allowing multiple bacteria to explore their environment at different starting points. Second, by allowing ourselves to move to a conformation with greater potential energy with some probability, we would give our local search algorithm a chance to “bounce” out of a local minimum. In an approach called simulated annealing, which is borrowed from metallurgy, we reduce the probability of increasing the free energy over time, so that the likelihood of bouncing out of a local minimum decreases over time, and eventually we will settle into a final conformation. Once again, randomness helps us solve practical problems!Applying an ab initio algorithm to a protein sequenceIn the tutorial linked below, we will use the web interface of a software resource called QUARK to run an ab initio structure prediction algorithm. QUARK is even more sophisticated than the algorithm discussed in the previous section. For example, its algorithm applies a combination of multiple scoring functions to look for the lowest energy conformation.Despite the sophistication of software like QUARK, the search space of all conformations is so large (recall Levinthal’s paradox from the previous lesson) that accurately predicting large protein structures remains very difficult. Accordingly, many ab initio x restrict the length of a protein sequence. This is the case for QUARK, which limits us to 200 amino acids. Since the SARS-CoV-2 spike protein contains 1281 amino acids, we will instead demonstrate how to use this software on the shorter human hemoglobin subunit alpha.Visit tutorialToward a faster approach for protein structure predictionThe figure below shows the top five structures produced by QUARK for human hemoglobin subunit alpha, along with the protein’s experimentally verified structure. It takes a keen eye to see any differences between these structures. We conclude that although ab initio prediction is slow, it is still able to accurately reconstruct a model of this protein from its amino acid sequence.A protein structure of human hemoglobin subunit alpha along with five ab initio models of this protein produced by QUARK. We can see how close all five models are to the experimentally verified structure, as shown in the superimposition of all six structures at right.Yet we also wonder whether we can speed up our structure prediction algorithms so that they will scale to a larger protein like the SARS-CoV-2 spike protein. In the next lesson, we will learn about another type of protein structure prediction that allows researchers to model large proteins by comparing a protein of unknown structure against a database of known structures.STOP: What existing protein structure(s) would you first want to consult when studying the SARS-CoV-2 spike protein?Next lesson"
} ,
{
"title" : "Comparing Protein Structures to Assess Model Accuracy",
"category" : "",
"tags" : "",
"url" : "/coronavirus/accuracy",
"date" : "",
"content" : "Experiments determine the structure of the SARS-CoV-2 spike proteinIn the previous lesson, we saw how to predict the structure of a protein from its sequence and a database of known structures. We then used homology modeling to predict the structure of the SARS-CoV-2 spike protein using three different software resources. This mimics the work of many researchers in January 2020, which included the contributions of volunteers’ computers from around the world.Meanwhile, other scientists were working on verifying the structure of the protein experimentally. On February 25, 2020, researchers from the Seattle Structural Genomics Center for Infectious Disease deposited the result of a cryo-EM experiment determining the structure of the spike protein to the PDB as entry 6vxx. If you would like to explore its shape a bit, check out the 3-D viewer for the protein at http://www.rcsb.org/3d-view/6VXX/1.In this lesson, we will compare our predicted results from the previous lesson to the empirically validated structure of the SARS-CoV-2 spike protein. How well do our models approximate the real structure?Comparing two shapes with the Kabsch algorithmUltimately, the problem of comparing protein structures is intrinsically similar to the comparison of two shapes, a problem that we will discuss first.STOP: Consider the two shapes in the figure below. How similar are they?Even if you think you have a good handle on comparing the above two shapes, it is because humans have very highly evolved eyes and brains that help us quickly cluster and classify the objects that we see in the world. Training a computer to see objects as well as we can is more difficult than you think!Our goal is to develop a “distance function d(S, T) that takes two shapes S and T as input and that quantifies how different these shapes are. If the two shapes are the same, then the distance between them should be equal to zero; the more different the shapes, the larger d should become.You may have noticed that the two shapes in the preceding figure are similar; in fact, they are the same. To demonstrate that this is true, we can first move the red shape to superimpose it over the blue shape, then flip the red shape, and finally rotate it so that its boundary coincides with the blue shape, as shown below.We can transform the red shape into the blue shape by translating it, flipping it, and then rotating it.More generally, if S can be translated, flipped, and/or rotated to produce T, then S and T are the same shape, and so d(S, T) is equal to zero. The question is what d(S, T) should be if S and T are not the same shape.Our idea for defining d(S, T) is first to translate, flip, and rotate S so that the resulting transformed shape resembles T “as much as possible”. We will then determine how different the resulting shapes are to determine d(S, T).To this end, we first translate S to have the same centroid (or center of mass) as T. The centroid of S is found at the point (xS, yS) such that xS is the average of x-coordinates on the boundary of S and yS is the average of y-coordinates on the boundary.For example, suppose S is the semicircular arc shown in the figure below, with endpoints (-1, 0) and (1, 0).A semicircular arc with radius 1 corresponding to a circle whose center is at the origin.The x-coordinate xS of this shape’s centroid is clearly zero. But yS is a little trickier to compute and requires us to apply a little calculus, taking the average of the y-values along the entire circle:\[\begin{align*}y_S & = \dfrac{\int_{0}^{\pi}{\sin{\theta}}}{\pi} \\& = \dfrac{-\cos{\pi} + \cos{0}}{\pi} \\& = \dfrac{2}{\pi}\end{align*}\]STOP: Say that we connect (-1, 0) and (0, 1) to form a closed semicircle. What will be the centroid of the resulting shape?The centroid of some shapes, like the semicircular arc in the preceding example, can be determined mathematically. But for irregular shapes, we can estimate the centroid of S by sampling n points from the boundary of the shape and taking the point whose coordinates are the average of the x and y coordinates of points on the boundary.Returning to our desire to compute d(S, T) for two arbitrary shapes, once we find the centroids of S and T, we translate S so that the two shapes have the same centroid. We then wish to find the rotation of S, possibly along with a flip as well, that makes the shape resemble T as much as possible.Imagine first that we have found the desired rotation; we can then define d(S, T) in the following way. We sample n points along the boundary of each shape, converting S and T into vectors s = (s1, …, sn) and t = (t1, …, tn), where si is the i-th point on the boundary of S. We then compute the root mean square deviation (RMSD) between the two shapes, which is the square root of the average squared distance between corresponding points in the vectors.\[\text{RMSD}(s, t) = \sqrt{\dfrac{1}{n} \cdot (d(s_1, t_1)^2 + d(s_2, t_2)^2 + \cdots + d(s_n, t_n)^2)}\]In this formula, d(si, ti) is the distance between the points si and ti in 2-D or 3-D space as the case may be.Note: RMSD is a very commonly used approach across data science when measuring the differences between two vectors.For an example RMSD calculation, consider the figure below, which shows two shapes with four points sampled from each.Two shapes with four points sampled from each.The distances between corresponding points in this figure are equal to \(\sqrt{2}\), 1, 4, and \(\sqrt{2}\). As a result, we compute the RMSD as\[\begin{align*}\text{RMSD}(s, t) & = \sqrt{\dfrac{1}{4} \cdot (\sqrt{2}^2 + 1^2 + 2^2 + \sqrt{2}^2)} \\& = \sqrt{\dfrac{1}{4} \cdot 9}\\& = \sqrt{\dfrac{9}{4}}\\& = \dfrac{3}{2}\end{align*}\]STOP: Do you see any issues with using RMSD to compare two shapes?Even if we assume that the shapes have already been overlapped and rotated appropriately, we still need to make sure that we sample enough points to give a good approximation of how different the shapes are. For an extreme example, consider a circle inscribed within a square, as shown in the figure below. If we happened to sample only the four points indicated, we would sample the same points in each shape, and conclude that the RMSD between these two shapes is zero. This issue is easily resolved by making sure to sample enough points to avoid approximation errors.A circle inscribed within a square. Sampling of the four points where the shapes intersect will give a flawed estimate of zero for RMSD.However, all this has left open the fact that we assumed that we had rotated S to be as “similar” to T as possible. In practice, after superimposing S and T to have the same centroid, we will need to find the rotation of S that minimizes the RMSD between our vectorizations of S and T, and this resulting minimum will be what we define as d(S, T). It turns out that there is an approach to find this best rotation called the Kabsch algorithm, which requires some advanced linear algebra and is beyond the scope of our work but is described here.Applying the Kabsch algorithm to protein structure comparisonThe Kabsch algorithm offers a compelling way to determine the similarity of two protein structures. We can convert a protein containing n amino acids into a vector of length n by selecting a single representative point from each amino acid. To do so, scientists typically choose the alpha carbon, the amino acid’s centrally located carbon atom that lies on the peptide’s backbone; the position of this atom will already be present in the .pdb file for a given structure.STOP: Can you think of example where a small difference between protein structures can cause a large inflation in RMSD score?Unfortunately, no perfect metric for shape comparison exists. To see why the Kabsch algorithm can be flawed, consider the figure below showing two toy protein structures. The orange structure (S) is identical to the blue structure (T) except for the change in a single bond angle between the third and fourth amino acids. And yet this tiny change in the protein’s structure causes a significant increase in d(si, ti) for every i greater than 3, which inflates the RMSD.(Top) Two hypothetical protein structures that differ in only a single bond angle between the third and fourth amino acids, shown in red. Each circle represents an alpha carbon. (Bottom left) Overlaying the first three amino acids shows how much the change in the bond angle throws off the computation of RMSD by increasing the distances between corresponding alpha carbons. (Bottom right) The Kabsch algorithm would align the centers of gravity of the two structures in order to minimize RMSD between corresponding alpha carbons. This makes it difficult for the untrained observer to notice that the two proteins only really differ in a single bond angle.Another way in which the Kabsch algorithm can be fooled is in the case of a substructure that is appended to the side of a structure and that throws off the ordering of the amino acids. For example, consider the following toy example of a structure into which we incorporate a loop.A simplification of two protein structures, one of which includes a loop of three amino acids. After the loop, each amino acid in the orange structure will be compared against an amino acid that occurs farther long in the blue structure, thus increasing d(si, ti)2 for each such amino acid.Finally, it may be the case that one or more amino acids is inserted into or deleted from one of the proteins. This mutation would have a similar effect on RMSD as the above figure. For this reason, biologists will often align two genes first, ignoring any positions that do not have a corresponding amino acid in one of the two proteins. (We will see an example of a protein alignment soon when comparing the coronavirus spike proteins.)In short, if the RMSD of two proteins is large, then we should be wary of concluding that the proteins are very different, and we may need to combine RMSD with other methods of structure comparison. But if the RMSD is small (e.g., just a few angstroms), then we can have some confidence that the proteins are indeed similar.We are now ready to consider the following tutorial, in which we apply the Kabsch algorithm to compare the structures that we predicted for human hemoglobin subunit alpha and the SARS-CoV-2 spike protein against their experimentally validated structures.Visit tutorialAssessing the accuracy of our structure prediction modelsIn the tutorials occurring earlier in this module, we used publicly available protein structure prediction servers to predict the structure of human hemoglobin subunit alpha (using ab initio modeling) and the SARS-CoV-2 spike protein (using homology modeling).Let’s see how well our models performed by showing the values of RMSD produced by the Kabsch algorithm when comparing each of these models against the validated structures.Ab initio (QUARK) models of Human Hemoglobin Subunit AlphaIn the ab initio tutorial, we used QUARK to perform ab initio structure prediction of human hemoglobin subunit alpha from its amino acid sequence, producing five models. In the following table, we show the RMSD produced by the Kabsch algorithm for each of these models against the validated structure of this subunit (PDB: 1si4). Quark Model RMSD QUARK1 1.58 QUARK2 2.0988 QUARK3 3.11 QUARK4 1.9343 QUARK5 2.6495 It is tempting to conclude that our ab initio prediction was a success. However, because human hemoglobin subunit alpha is such a short protein (141 amino acids), researchers would consider this RMSD score high.We know that homology modeling will be faster than ab initio modeling. But will it be more accurate as well?Homology models of SARS-CoV-2 S proteinIn the homology tutorial, we used SWISS-MODEL and Robetta to predict the structure of the SARS-CoV-2 spike protein, and we used GalaxyWeb to predict the structure of this protein’s receptor binding domain (RBD). In addition to our predicted models, we will also assess five predicted models of the full SARS-CoV-2 spike protein released early in the COVID-19 pandemic by Rosetta@Home and published to the Seattle Structural Genomics Center for Infectious Disease (SSGCID). Because the work needed to generate these models was distributed over many users’ machines, comparing the RMSD scores obtained by the Rosetta@Home models against our own may provide insights on the effect of computational power on the accuracy of predictions. The SSGCID models can be found here.GalaxyWEBFirst, we consider the GalaxyWEB models that we produced of the spike protein RBD. We compared these models to the validated SARS-CoV-2 RBD (PDB entry: 6lzg). GalaxyWEB RMSD Galaxy1 0.1775 Galaxy2 0.1459 Galaxy3 0.1526 Galaxy4 0.1434 Galaxy5 0.1202 All of these models have an excellent RMSD score and can be considered very accurate. Note that their RMSD is more than an order of magnitude lower than the RMSD computed for our ab initio model of hemoglobin subunit alpha, despite the fact that the RBD is longer (229 amino acids).SWISS-MODELWe now shift to homology models of the entire spike protein and start with SWISS-MODEL. We compared each model produced by SWISS-MODEL against the validated structure of the SARS-CoV-2 spike protein (PDB entry: 6vxx). SWISS MODEL RMSD SWISS1 5.8518 SWISS2 11.3432 SWISS3 11.3432 From the scores, we can see that model SWISS1 performed the best. Even though the RMSD score of 5.818 is significantly higher than what we saw for the GalaxyWEB prediction for the RBD, keep in mind that the spike protein is 1281 amino acids long, and so the sensitivity of RMSD to slight changes should give us confidence that our models are on the right track.RobettaRobetta produced five models of a single chain of the SARS-CoV-2 spike protein. As with the models produced by SWISS-MODEL, we compared each of them against the validated structure of the SARS-CoV-2 spike protein (PDB: 6vxx). Robetta RMSD Robetta1 3.1189 Robetta2 3.7568 Robetta3 2.9972 Robetta4 2.5852 Robetta5 12.0975 STOP: Which do you think performed more accurately on our predictions: SWISS-MODEL or Robetta?Most of the Robetta models for a single chain beat the SWISS-Model predictions for the entire protein. This makes it difficult to say at the moment which resource has performed better.SSGCIDAs explained above, the SSGCID models of the S protein released by Rosetta@Home used large amounts of computational power. Therefore, we might expect to see RMSD scores lower than those of our models. Like before, we will compare the models to the validated structure of (PDB: 6vxx). This time, we will assess the accuracy of predictions of a single chain as well as of the entire spike protein. SSGCID RMSD (Full Protein) RMSD (Single Chain) SSGCID1 3.505 2.7843 SSGCID2 2.3274 2.107 SSGCID3 2.12 1.866 SSGCID4 2.0854 2.047 SSGCID5 4.9636 4.6443 STOP: Consider the following two questions.First, note that SSGCID3 modeled a single chain more accurately, but SSGCID4 modeled a more accurate full protein. What do you think might have caused this?Second, why do you think that the full protein RMSD values are so close to the single chain values?As we might expect due to their access to the resources of thousands of users’ computers, the SSGCID models outperform our SWISS-MODEL models. But it is also worth noting that their RMSD values are not as close to zero as we might expect, even with access to hundreds of contributors’ computational resources. Is protein structure prediction a hopeless problem?Next lesson"
} ,
{
"title" : "Classifiers",
"category" : "",
"tags" : "",
"url" : "/white_blood_cells/classifiers",
"date" : "",
"content" : " Need overview of the problem of classification, with some visuals (preferably our own, in a 2-D example) Use Iris flower data set Give overview of most standard approach for classification, kNN Quantifying quality: training and test datasets But this is an algorithm that applies to points. What we have are shapes. How can shapes be assigned to points in space? Section break Another idea: find a way of directly assigning shapes to points. We’ve done this! When we sampled points from a protein structure we sampled n points from the surface. (There is an issue here, which is that we also need the Kabsch algorithm. It may be that we have the exact same proteins, but they have to be aligned and rotated to reveal this.) Better approach is to use distances. Identify distances between points and then try to assign shapes to points – this is the stone tablet problem (perhaps move to intro). Issue: we’d love to build a shape space, but dimension is huge. Need for dimensionality reduction (not if we are doing Kabsch). Applying classifier to our space. Probably need cross-validation as its own section. Epilogue: neural nets? Maybe not depending on Kabsch. "
} ,
{
"title" : "Part 1 Conclusion: Protein Structure Prediction is Solved! (Kinda…)",
"category" : "",
"tags" : "",
"url" : "/coronavirus/conclusion_part_1",
"date" : "",
"content" : "SARS-CoV-2 protein structure prediction and open scienceResearchers have worked for several decades to decipher nature’s magic algorithm for protein folding. The Soviets even founded an entire research insitute dedicated to protein research in 1967. Most of the scientists who were there for its founding are dead now, and yet the institute carries on. Although structure prediction is an old problem, biologists have never given up hope that continued improvements to their algorithms and ever-increasing computational resources would allow them one day to proclaim, “Maybe this is good enough!”.That day has come.Every two years since 1994, a global effort called Critical Assessment of protein Structure Prediction (CASP) has allowed researchers from around the world to test their protein structure prediction algorithms against each other. The contest organizers compile a (secret) collection of experimentally verified protein structures and then run all submitted algorithms against these proteins.The 14th iteration of this contest, held in 2020, was won in a landslide. The second version of AlphaFold, one of the projects of DeepMind (an Alphabet subsidiary), vastly outperformed the world’s foremost structure prediction approaches, including those that we discussed in this module. The AlphaFold algorithm is an extremely involved method based on deep learning. If you’re interested in learning more about this method, consult the AlphaFold website or this excellent blog post by Mohammed al Quraishi: https://bit.ly/39Mnym3.We will show a few plots to illustrate the decisiveness of AlphaFold’s CASP victory. The first graph, which is shown in the figure below, compares the scores of AlphaFold against the second-place algorithm (a product of David Baker’s laboratory, which developed the Robetta and Rosetta@Home software that we used in this module).Instead of using RMSD, CASP scores a predicted structure against a known structure using the global distance test (GDT). For some threshold t, we first take the percentage alpha carbon positions for which the distance between corresponding alpha carbons in the two structures is at most t. The GDT score that CASP uses then averages the percentages obtained when t is equal to each of 1, 2, 4, and 8 angstroms. A GDT score of 90% is considered good, and a score of 95% is considered excellent (i.e., comparable to minor errors resulting from experimentation) 1.A plot of GDT scores for the 1st place (AlphaFold2) and 2nd place (Baker lab) submissions over all proteins in the CASP14 contest. Source: https://bit.ly/39Mnym3.We can appreciate the margin of victory over the second-place competitor if we compare this second-place competitor against the third-place competitor (submitted by the Yang Zhang lab). The results are shown in the figure below.A plot of GDT scores for the 2nd place (Baker lab) and 3rd place (Zhang lab) submissions over all proteins in the CASP14 contest. Source: https://bit.ly/39Mnym3.For each protein target in the contest, we can determine each algorithm’s z-score. This score is defined as the number of standard deviations that the algorithm’s GDT score falls from the mean GDT score for all competitors. For example, a z-score of 1.4 would be 1.4 standard deviations above the mean, and a z-score of -0.9 would be 0.9 standard deviations below the mean.By summing all of an algorithm’s positive z-scores, we obtain a reasonable metric for the relative quality of an algorithm compared to its competitors. If an algorithm’s sum of z-scores is large, then the algorithm racked up lots of positive z-scores, and we can conclude that it performed well. The figure below shows the sum of z-scores for all CASP14 participants and reiterates the margin of AlphaFold’s victory.Sum of z-scores for every Source: https://predictioncenter.org/casp14/zscores_final.cgi.AlphaFold’s CASP14 victory led some scientists – and media outlets – to declare that protein structure prediction had finally been solved 2. Yet some critics remained skeptical.AlphaFold obtained an impressive median RMSD of 1.6 for its predicted proteins 1, but to be completely trustworthy for a sensitive application like designing drug targets, a predicted protein structure would need to have an RMSD nearly an order of magnitude lower.Furthermore, about a third of AlphaFold’s CASP14 predictions have an RMSD over 2.0, an often used threshold for whether a predicted structure is reliable. And there is no way of knowing in advance whether AlphaFold will perform well on a given protein, unless we validate the protein’s structure, which causes a catch-22. For example, AlphaFold published their predictions of the structures of other SARS-CoV-2 proteins3, none of which had validated structures in 2020. Probably most of these predictions are accurate, but we cannot know for sure unless we run an experiment to verify their structures.And although AlphaFold release the non-spike protein predicted structures, they have thus far neglected to publish their prediction of the SARS-CoV-2 spike protein, or to explain the details of their algorithm (e.g., in a peer-reviewed forum). Without these efforts, the project invites criticism from open science advocates.Finally, because AlphaFold applies a deep learning approach, the algorithm is “trained” using a database of known protein structures, which makes it more likely to succeed if a protein is similar to a known structure. But it is the proteins with structures dissimilar to any known structure that possess some of the most scientific interest.Pronouncing protein structure prediction to be solved may be dubious, but it is fair to acknowledge that we will likely never again see such a clear improvement to the state of the art. AlphaFold is quite possibly the final great innovation in a research problem that has puzzled biologists for fifty years.Thus ends part 1 of this module, but there is still much for us to discuss. We hope that you will join us for part 2, in which we will delve further into measuring the differences between the spike proteins of SARS-CoV-1 and SARS-CoV-2 using the validated protein structures published to PDB early in the pandemic. Can we use modeling and computation to determine why SARS-CoV-2 has been so much more infectious? We hope that you will join us to find out.Continue to part 2: spike protein comparison AlQuraishi, M. 2020, December 8. AlphaFold2 @ CASP14: “It feels like one’s child has left.” Retrieved January 20, 2021, from https://bit.ly/39Mnym3 ↩ ↩2 Service, R. F. (2020, November 30). ‘The game has changed.’ AI triumphs at solving protein structures. Science. doi:10.1126/science.abf9367 ↩ Computational predictions of protein structures associated with COVID-19 [Web log post]. (2020, August 04). Retrieved January 20, 2021, from https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19 ↩ "
} ,
{
"title" : "Part 2 Conclusion: From Static Protein Analysis to Molecular Dynamics",
"category" : "",
"tags" : "",
"url" : "/coronavirus/conclusion_part_2",
"date" : "",
"content" : "Modeling protein bonds using tiny springsTo conclude part 2 of this module, we transition from the static study of proteins to the field of molecular dynamics (MD), in which we simulate the movement of proteins’ atoms, along with their interactions as they move.You may think that simulating the movments of proteins with hundreds of amino acids will be a hopeless task. After all, predicting the static structure of a protein has occupied biologists for decades! Yet part of what makes structure prediction so challenging is that the “search space” of potential shapes is so enormous. Once we have established the static structure of a protein, its dynamic behavior will not allow it to deviate greatly from this static structure, and so the space of potential structures is automatically narrowed down to those that are similar to the static structure.A protein’s molecular bonds are constantly vibrating, stretching and compressing, much like that of the oscillating mass-spring system shown in the figure below. Bonded atoms are held together by sharing electrons and are held at specific bond length due to the attraction and repulsion forces of the negatively charged electrons and positively charged nucleus. If you push the atoms closer together or pull them farther apart, they will “bounce back” to their equilibrium.A mass-spring system in which a mass is attached to the end of a spring. The more we move the mass from its equilibrium, the greater its resistance and the more it will be repelled back toward equilibrium. Courtesy: flippingphysics.com.In an elastic network model (ENM), we imagine nearby alpha carbons of a protein structure to be connected by springs. Because distant atoms will not influence each other, we will only connect two alpha carbons if they are within some threshold distance of each other (the default threshold used by ProDy is seven angstroms).A major strength of ProDy is its implementation of a Gaussian network model (GNM), an ENM for molecular dynamics; the GNM is called “Gaussian” because protein bond movements follow normally distributed (Gaussian) distributions around their equilibria. Furthermore, this model is isotropic, meaning that it only considers the magnitude of force exerted on the springs between nearby molecules and ignores any global effect on the directions of these forces.Although it may seem that atomic movements are frantic and random, the movements of protein atoms are in fact heavily coordinated, owing to the evolution of the proteins to perform replicable tasks. As a result, the oscillations of these particles are often highly structured and can be summarized by using a combination of functions explaining them, or modes. (For those familiar with Fourier analysis, this is analogous to the fact that a function under certain conditions can be approximated using a sum of sine and cosine waves.) The paradigm resulting from the insight of breaking down oscillations into a comparatively small number of modes that summarize them is called normal mode analysis (NMA) and powers the elastic model that ProDy implements.We will say more about NMA later in this lesson, but the details rely on some advanced linear algebra and are too technical for our aims in this course. For those interested, a full treatment of the mathematics of GNMs can be found in the chapter at https://www.csb.pitt.edu/Faculty/bahar/publications/b14.pdf.By running molecular dynamics simulations, we obtain another way to study two homologous proteins by comparing their patterns of fluctuation under perturbation. With this in mind, we will use ProDy to perform NMA calculations as a final method of comparing the SARS-CoV-2 and SARS-CoV spike proteins. We also will use ProDy to compute a contact map, if you are interested in doing this after our discussion of contact maps in a previous lesson. When we return from the tutorial, we will explain each of the analyses that we perform in the tutorial.Visit tutorialMolecular dynamics analyses of SARS-CoV and SARS-CoV-2 spike proteins using GNMIn the tutorial, we used ProDy to generate visualizations of how the SARS-CoV-2 spike protein fluctuates compared to that of SARS-CoV. Here, we will explain how to interpret the results and compare them to analyze the similarities between the two proteins.Cross-CorrelationMuch as a contact map indicated which amino acids in a protein structure are close to each other, we will use a cross correlation map to show whether the movements of different amino acids are coordinated as the protein flexes. A matrix M receives a value at M(i, j) equal to the correlation between the movements of the i-th and j-th amino acids in a protein structure. The values of this matrix are decimals ranging from -1 to 1. M(i, j) is equal to 1 if the movements are completely correlated (both amino acids always move in the same direction), a value of -1 if the movements are completely anticorrelated (both amino acids always move in opposite directions), and a value of 0 if the movements are completely uncorrelated.Much as the contact map typically has many values equal to 1 near the main diagonal, we commonly see a diagonal of strong cross-correlation values (i.e., either close to -1 or close to 1) because movements in an amino acid will almost always affect nearby amino acids.Positive correlations near the diagonal represents correlations between contiguous residues and are characteristics of secondary structures (e.g., alpha helices and beta sheets), in which amino acids tend to move together. Correlations and anticorrellations off the diagonal (i.e., for amino acids distant from each other in the protein structure) may potential represent interesting interactions between non-contiguous residues and domains for further study.From our results, we see that the SARS-CoV-2 and SARS S protein fluctuate similarly, supporting that they not only have similar structures, but similar dynamics as well.The cross-correlation heat maps of the SARS-CoV-2 spike protein (top-left), SARS-CoV spike protein (top-right), single chain of the SARS-CoV-2 spike protein (bottom-left), and single-chain of the SARS-CoV spike protein (bottom-right). The map shows every residue pair in the structure and the colors represent the correlation in the fluctuations of residues as shown in the spectrum. A value of 1.0 (red) means that the amino acids’ movements are perfectly correlated, and a value of -1.0 (dark blue) means that their movements are perfectly anticorrelated.Slow mode shape and square fluctuationsAbove, we pointed out that in NMA, we break down the complex movements of a protein in terms of a few simpler component functions called “modes”. The mode having the greatest contribution to these fluctuations (called the “slowest” mode) is charted in the figure below, called a slow mode shape plot, for the SARS-CoV-2 and SARS-CoV spike proteins. The amino acid positions are across the x-axis, and the direction/magnitude of movement is shown on the y-axis. Positive and negative values correspond to opposite directions of movement, and the farther a value is from zero, the more this position moves with respect to the given mode.In this figure, we can see that the protein region between positions 200 and 500 of the spike protein is the most mobile. This region overlaps with the RBD region, found between residues 331 to 524. This analysis indicates that the RBD is a relatively mobile part of the spike protein, which matches our intuition that the RBD might need to be flexible in order to “catch” the moving target of an ACE2 enzyme and latch onto it.Slow mode plots of the SARS-CoV-2 spike protein (top-left), SARS-CoV spike protein (top-right), single chain of the SARS-CoV-2 spike protein (bottom-left), and single chain of the SARS-CoV spike protein (bottom-right). The x-axis represents the amino acid positions along the protein, and the y-axis represents the relative fluctuations at each amino acid position. From the single-chain plots for both SARS-CoV-2 and SARS, we see that the residues between 200 – 500 fluctuate the most. The plots between SARS-CoV-2 and SARS-CoV are very similar, indicating similar protein fluctuations for this mode.A related plot called a slow mode square fluctuations plot is similar to the slow mode shape plot, except that its values are produced by multiplying the square of the slow mode by the variance along the mode. In this case, all the values will be positive, and larger amplitudes represent regions of greater fluctuation. As with the slow mode plots, the square fluctuations plots for SARS-CoV-2 and SARS-CoV shown below indicate that the RBD is highly mobile compared with the rest of the spike protein.Plots of the slow mode square fluctuation of the SARS-CoV-2 spike protein (top-left), SARS-CoV spike protein (top-right), a single chain of the SARS-CoV-2 spike protein (bottom-left), and a single chain of the SARS-CoV spike protein (bottom-right). The x-axis represents the amino acid positions along the protein, and the y-axis is proportional to the square of the fluctuations at each amino acid position. The plots between SARS-CoV-2 and SARS-CoV are very similar, indicating similar protein fluctuations for this mode.Comparing ResultsFrom these results, we can see that the SARS-CoV-2 and SARS-CoV spike proteins are not only very similar in terms of structure, but they are similar in terms of dynamics as well. This result is perhaps not a surprise since they both target the ACE2 enzyme, and it drives home the fact that proteins can seem almost identical and yet one can have very subtle changes that turns an outbreak into a pandemic.ANM models account for the direction of protein fluctuationsThe anisotropic counterpart to GNM, in which the direction of fluctuations is also considered, is called anisotropic network model (ANM). Although ANM includes directionality, it typically performs worse than GNM when compared with experimental data1. However, we can this model offers the benefit that it can be used to create animations depicting the range of motions and fluctuations of the protein.In the tutorial linked below, we will apply ANM to produce versions of the plots that we produced above. We will also encounter NMWiz, which is short for “normal mode wizard”, a GUI for ProDy that is available as a plugin for VMD. We will use NMWiz to perform ANM calculations and create an animation of the SARS-CoV-2 (chimeric) RBD (PDB entry: 6vw1) and the SARS-CoV RBD (PDB entry: 2ajf).Visit tutorialANM analysis of the coronavirus binding domainIn the tutorial, we were able to generate a cross-correlation map and square fluctuation plot for the SARS-CoV-2 RBD, which resemble the results that we obtained previously for GNM (see figure below). Unsurprisingly, we do not see significant differences between the plots for the two viruses.The cross-correlation map (top) and the square fluctuation plot (bottom) for the SARS-CoV-2 (left) and SARS (right) RBDs using ANM. Like the results from the GNM analysis, the map and plot are very similar between the two RBDs, indicating that their dynamics are similar.The fluctuations calculated by ANM provide information on possible movement and flexibility but do not depict actual protein movements. To predict these movements, we used NMWiz and VMD to create animations of the protein fluctuations over time as calculated via ANM analysis. The following two animations show of the complex of each virus’s RBD (purple) bound with ACE2 (green). Important residues from the three sites of conformational differences from the previous lessons are also highlighted.SARS-CoV spike protein RBD (PDB: 2ajf) SARS RBD Purple Resid 463 to 472 (Loop) Yellow Resid 442 (Hotspot 31) Orange Resid 487 (Hotspot 353) Red ACE2 Green Resid 79, 82, 83 (Loop) Silver Resid 31, 35 (Hotspot 31) Orange Resid 38, 353 (Hotspot 353) Red SARS-CoV-2 spike protein chimeric RBD (PDB: 6vw1) SARS-CoV-2 (Chimeric) RBD Purple Resid 476 to 486 (Loop) Yellow Resid 455 (Hotspot 31) Blue Resid 493 (Hotspot 31) Orange Resid 501 (Hotspot 353) Red ACE2 Green Resid 79, 82, 83 (Loop) Silver Resid 31, 35 (Hotspot 31) Orange Resid 38, 353 (Hotspot 353) Red Recall from our work in the previous lesson that the greatest contribution of negative energy to the RBD/ACE2 complex in SARS-CoV-2 was the region called “hotspot 31”. This region is highlighted in blue and orange in the above figures. If you look very closely (you may need to zoom in), as the protein swings in to bind with ACE2, the blue and orange regions appear to line up just a bit more naturally in the SARS-CoV-2 animation than in the SARS-CoV animation. That is, the improved binding that we hypothesized for a static structure appears to be confirmed by dynamics simulations. This provides one more piece of evidence that SARS-CoV-2 is more effective at binding to the ACE2 enzyme.Summing UpIn this module, we have discussed a great deal of computational methods surrounding the analysis of proteins. We began with a discussion of the fundamental problem of determining a protein’s structure. Because experimental methods for identifying protein structure are costly and time consuming, we transitioned to discuss algorithmic approaches that do a good job of predicting a protein’s structure from its sequence of amino acids.We then transitioned to the problem of comparing structures for related proteins, with a lengthy case study on comparing the SARS-CoV and SARS-CoV-2 spike protein structures. We saw that the problem of quantifying the “difference” between two shapes is more challenging than it might seem, and we established both global and local structure comparison metrics. We applied these approaches to isolate three candidate regions of the SARS-CoV-2 spike protein that seem to be bound better to the ACE2 enzyme, and we quantified this binding using a localized energy function.We then saw that to infer a protein’s function, we need to move from studying structure to molecular dynamics, studying how the protein behaves within its environment as it flexes and bends in order to interact with other molecules.This is a great deal of ground to have covered, but if we would like to present an ultimate moral to this chapter, it is that biology is an extremely complex subject. The structure prediction problem is decades old and still not fully solved, and computational approaches for studying protein structure and dynamics are sophisticated. But there is just as much that we have left undiscussed. What happens after the spike protein binds to ACE2? How does the virus enter the cell? How does it replicate itself? How does it fight our immune systems, and how can we design a vaccine to fight back? We would need far more time than we have here to treat all of these topics, but ifyou are interested in an online course covering some of them, then check out the free online course SARS Wars: A New Hope by our colleague Christopher James Langmead.Thus concludes the third module of this course. In the course’s final module, we will turn our attention to a very different type of problem. To fight a virus like SARS, your body employs a cavalry of white blood cells. Maintaining healthy levels of these cells is vital to a strong immune system, and blood reports run counts of these cells to ensure they are within normal ranges. Can we teach a computer to run this analysis automatically?We hope you will join us to find out! (New module coming soon.) Yang, L., Song, G., Jernigan, R. 2009. Protein elastic network models and the ranges of cooperativity. PNAS 106(30), 12347-12352. https://doi.org/10.1073/pnas.0902159106 ↩ "
} ,
{
"title" : "Part 2 Conclusion: From Static Protein Analysis to Molecular Dynamics",
"category" : "",
"tags" : "",
"url" : "/coronavirus/conclusion_part_2_draft",
"date" : "",
"content" : "Modeling protein bonds using tiny springsTo conclude part 2 of this module, we transition from the static study of proteins to the field of molecular dynamics (MD), in which we simulate the movement of proteins’ atoms, along with their interactions as they move.You may think that simulating the movments of proteins with hundreds of amino acids will be a hopeless task. After all, predicting the static structure of a protein has occupied biologists for decades! Yet part of what makes structure prediction so challenging is that the “search space” of potential shapes is so enormous. Once we have established the static structure of a protein, its dynamic behavior will not allow it to deviate greatly from this static structure, and so the space of potential structures is automatically narrowed down to those that are similar to the static structure.A protein’s molecular bonds are constantly vibrating, stretching and compressing, much like that of the oscillating mass-spring system shown in the figure below. Bonded atoms are held together by sharing electrons and are held at specific bond length due to the attraction and repulsion forces of the negatively charged electrons and positively charged nucleus. If you push the atoms closer together or pull them farther apart, they will “bounce back” to their equilibrium.A mass-spring system in which a mass is attached to the end of a spring. The more we move the mass from its equilibrium, the greater its resistance and the more it will be repelled back toward equilibrium. Courtesy: flippingphysics.com.In an elastic network model (ENM), we imagine nearby alpha carbons of a protein structure to be connected by springs. Because distant atoms will not influence each other, we will only connect two alpha carbons if they are within some threshold distance of each other (the default threshold used by ProDy is seven angstroms).A major strength of ProDy is its implementation of a Gaussian network model (GNM), an ENM for molecular dynamics; the GNM is called “Gaussian” because protein bond movements follow normally distributed (Gaussian) distributions around their equilibria. Furthermore, this model is isotropic, meaning that it only considers the magnitude of force exerted on the springs between nearby molecules and ignores any global effect on the directions of these forces.Although it may seem that atomic movements are frantic and random, the movements of protein atoms are in fact heavily coordinated, owing to the evolution of the proteins to perform replicable tasks. As a result, the oscillations of these particles are often highly structured and can be summarized by using a combination of functions explaining them, or modes. (For those familiar with Fourier analysis, this is analogous to the fact that a function under certain conditions can be approximated using a sum of sine and cosine waves.) The paradigm resulting from the insight of breaking down oscillations into a comparatively small number of modes that summarize them is called normal mode analysis (NMA) and powers the elastic model that ProDy implements.Introduction to GNMPerforming GNM analysis on a protein gives us a fairly accurate understanding of how the protein is structured, particularly on the flexibility of the proteinand how each residue moves relative to the rest. In this section, we will revisit the human hemoglobin (1A3N.pdb) to peform GNM analysis. Recall that in GNM, the target molecule is represented using ENM. Therefore, the first step in GNM analysis is to convert hemoglobin into a system of nodes and springs. As mentioned above, this can be easily done by stripping the protein to only alpha carbons and connecting alpha carbons that are within a threshold distance. Generally, the threshold distance for GNM is set between 7 to 8 Å.Conversion of human hemoglobin (left) to an elastic network model with cutoff distance of 7.3 Å (right).Each node in the model is subject to Gaussian fluctuations that cause it to deviate in position from its equilibrium. As a direct consequence, the distance between nodes will also undergo Gaussian fluctuations. For a given node i and node j, the equilibrium position is represented by the equilibrium position vector \(R_i^0\) and \(R_j^0\). The fluctuation for node i and node j is represented by instantaneous fluction vectors \(\Delta R_i\) and \(\Delta R_j\). The distance between node i and node j at equilibrium is represented by the equilibrium distance vector \(R_{ij}^0\), and the distance between nodes i and j in fluctuation is represented by the instantaneous distance vector \(R_{ij}\). Finally, we can calculate the fluctionation in the distance, \(\Delta R_{ij} = R_{ij} - R_{ij}^0 = \Delta R_j - \Delta R_i\).Schematic showing gaussian fluctuations between two nodes. Equilibrium positions of node i and node j are represented by distance vectors \(R_i^0\) and \(R_j^0\). The equilibrium distance between the nodes is labelled \(R_{ij}^0\). The instantaneous fluction vectors, are labelled \(\Delta R_i\) and \(\Delta R_j\) and the instantaneous distance vector is labeled \(\Delta R_{ij}\). Image courtesy of Ahmet Bakan.The next step is to construct a Kirchhoff matrix, also known as the Laplacian matrix or connectivity matrix, represented by the symbol \(\Gamma\). Commonly used in graph theory, the Kirchhoff matrix is essentially a square matrix representation of a graph. By transforming the protein into a set of connected nodes, we are converting the protein into a graph. Therefore, the Kirchhoff matrix can be used to represent the protein, allowing us to go from a biochemistry problem to a linear algebra problem. In this case, the Kirchhoff matrix is the matrix representation of which pairs of residues are connected. There are also some useful properties of the Kirchhoff matrix that we will take advantage of later on. The matrix is constructed as follows:\[\Gamma_{ij} = \begin{cases} & -1 \text{ if $i \neq j$ and $R_{ij} \leq r_c$}\\ & 0 \text{ if $i \neq j$ and $R_{ij} > r_c$} \end{cases}\]\[\Gamma_{ii} = -\sum_j \Gamma_{ij}\]where \(r_c\) is the threshold distance. Simply put, if residue i and residue j are connected, then the value of position i,j in the matrix will be -1. If they are not connected, the the value will be 0. The values of the diagonals, i.e. position i,i, correspond to the total number of connections of residue i.Toy structure and the corresponding Kirchhoff matrix.One of the most common analysis using GNM is on the coordinated movement between residues as the protein fluctuates. More specifically, we want to see how each residue will move relative to other residues, or the cross-correlation between the residues. Recall that we are representing the fluctuations as vectors (see Gaussian Fluctuations). Therefore, for some residue i and residue j, we are trying to compute how much of the fluctation vector \(\Delta R_i\) points in the the same direction as the fluctuation vector \(\Delta R_j\). To do this, we need to compute the inner product of the vectors, denoted by the angle brackets: \(\langle \rangle\), which is a generalization of the dot product. In other words, computing the inner product between the fluctuation vectors is synonomous to computing the cross-correlation between the residues. As such, the cross-correlation between residue i and residue j is often represented as \(\langle \Delta R_i \cdot \Delta R_j \rangle\). It turns out that the inner product is correlated to the inverse of the Kirchhoff matrix, allowing us to simply invert the Kirchhoff matrix. The cross-correlation between some residue i and residue j can be mathmatically calculated as follows:\[\langle \Delta R_i \cdot \Delta R_j \rangle = \frac{3 k_B T}{\gamma} \left[ \Gamma^{-1} \right]_{ij}\]where \(k_B\) is the Boltzmann constant, \(\gamma\) is the spring constant (stiffness of the spring), and \(\left[ \Gamma^{-1} \right]_{ij}\) is element ij in the inverted Kirchhoff matrix. Similarly, we can also calculate the expectation values of the fluctuation for each residue, or the mean-square fluctuations, which is the inner product of the fluctuation vector with itself:\[\langle \Delta R_i^2 \rangle = \frac{3 k_B T}{\gamma} \left[ \Gamma^{-1} \right]_{ii}\]From these equations, we can see that the inverse Kirchhoff matrix fully defines both the cross-correlations between residue motions as well as the mean-square fluctions of the residues.\[\left[ \Gamma^{-1} \right]_{ij} \sim \langle \Delta R_i \cdot \Delta R_j \rangle\]\[\left[ \Gamma^{-1} \right]_{ii} \sim \langle \Delta R_i^2 \rangle\]However, we run into problems here because cannot simply invert the Kirchhoff matrix. In linear algebra, a matrix is invertible if and only if its determinant is zero. Unfortunately for us, one of the special properties of the Kirchhoff matrix in GNM is that the determinant is zero, and we cannot directly invert the matrix to get \(\Gamma^{-1}\). Thankfully, there is a method to compute the values of the inverted matrix by performing eigen decomposition on the matrix.\[\Gamma = U \Lambda U^T\]where \(U\) is the orthogonal matrix with the \(k^{th}\) column, represented by \(u_k\), corresponding to the \(k^{th}\) eigenvector of \(\Gamma\), and \(\Lambda\) is the diagonal matrix of eigenvalues, represented by \(\lambda_k\). Based on the characteristics of the Kirchhoff matrix (positive semi-definite), the first eigenvalue, \(\lambda_1\), is 0. The remaining \(N-1\) eigenvalues, as well as the eigenvectors in \(U\), actually directly describe the modes of motion that discussed earlier in this lesson. The elements of eigenvector \(u_k\) describe the distribution of residue displacements, normalized over all the residues, along the \(k^{th}\) mode axis. In other words, the motion of the \(i^{th}\) residue along the \(k^{th}\) mode is described by the \(i^{th}\) element in eigenvector \(u_k\). The corresponding eigenvalue \(\lambda_k\) describes the frequency of the \(k^{th}\) mode, where the smallest \(\lambda\) value corresponds to the lowest frequency modes, or slowest modes, that make the largest contribution to the overall protein motion.After eigen decomposition, we can now rewrite the cross-correlation equation as a sum of the N-1 GNM modes:\[\langle \Delta R_i \cdot \Delta R_j \rangle = \frac{3 k_B T}{\gamma} \sum_{k=1}^{N-1} \left[ \lambda_k^{-1} u_k u_k^T \right]_{ij}\]and similarly for mean-square fluctuation:\[\langle \Delta R_i^2 \rangle = \frac{3 k_B T}{\gamma} \sum_{k=1}^{N-1} \left[ \lambda_k^{-1} u_k u_k^T \right]_{ii}\]Now that we can compute the cross-correlation between residues, we can normalize the values and construct a normalized cross-correlation matrix, \(C^{(n)}\), such that:\[C^{(n)}_{ij} = \frac{\langle \Delta R_i \cdot \Delta R_j \rangle}{\left[ \langle \Delta R_i \cdot \Delta R_i \rangle \langle \Delta R_j \cdot \Delta R_j \rangle \right]^{\frac{1}{2}}}\]where \(C^{(n)}_{ij}\) corresponds to the orientational cross-correlation between residue i and residue j. Because we normalized the values, the range of \(C^{(n)}_{ij}\) is \([-1,1]\), where 1 means the residues are fully correlated in motion, and -1 means the residues are fully anti-correlated in motion.Cross-Correlation MapCross-correlation analysis provides useful insight on the structure of the protein. The regions of high correlation coming off the diagonal typically provide information on secondary structures (residues in the same secondary structure will typically move together). On the other hand, high correlation regions not near the diagonal provide information on the tertiary structure of the protein, such as protein domains and clues to which parts of the protein work together. In general, we can observe complex patterns of correlated and anti-correlated movement throughout the protein (both inter- and intrasubunit), which can act like some sort of fingerprint. We can compare the cross-correlation between regions of the same protein or the cross correlation map between two similar proteins to find differences in the correlation patterns. This would then provide clues in where the proteins or protein regions are different structurally and possibly functionally. After calculating the cross-correlation for each residue pair, we can organize the data as a matrix and then visualize it as a cross-correlation heat map like the figure below.Normalized cross-correlation heat map of human hemoglobin (1A3N) using the first 20 slowest normal modes. Red regions indicate correlated residue pairs which move in the same direction; blue regions indicate anti-correlated residue pairs which move in opposite directions.In the cross-correlation map of human hemoglobin above, we see four squares of positive correlation along the diagonal. This represents the four subunits of hemoglobin, \(\alpha_1\), \(\beta_1\), \(\alpha_2\), and \(\beta_2\) in this order and the intrasubunit correlations. We can differentiate between the two types of subunits by comparing the correlation patterns between the four squares. We see that the same patterns can be seen between the first and third square, and the second and fourth square. Assuming that first square represents \(\alpha_1\), we can deduce that the third square represents \(\alpha_2\), and that the second and fourth square represent \(\beta\) subunits.The rest of the cross-correlation map (regions next to the diagonal squares) provide evidence of high intersubunit correlations between \(\alpha_1 \beta_1\)/\(\alpha_2 \beta_2\), some correlation between the \(\alpha_1 \beta_2\)/\(\alpha_2 \beta_1\), and minimal correlation between the \(\alpha_1 \alpha_2\)/\(\beta_1 \beta_2\). This agrees with experimental analysis of human hemoglobin on the interaction of the extensive, cooperative interactions between \(\alpha\) and \(\beta\) subunits, and minimal interactions between \(\alpha\) subunits and between \(\beta\) subunits 1.Mean-square Fluctuations & B-factorJust like cross-correlation, we can also visualize the mean-square fluctuations of the residues. This is typically done in two ways. The simplest is to directly plot the values, where the x-axis represent the residues and the y-axis represent the mean-square fluctuation \(\langle \Delta R_i^2 \rangle\). The other, more useful, method is to plot the B-factor. When performing crystallography, the displacement of atoms within the protein crystal decreases the intesity of the scattered X-ray, creating uncertainty in the positions of atoms. B-factor, also known as temperature factor or Debye-Waller factor is a measure of this uncertainty, which includes noise from positional variance of thermal protein motion, model errors, and lattice defects. B-factors are reported in addition to the atomic coordinates in the PDB entry. One of the main reason we use B-factors is that they scale with the mean-square fluctuation, such that for atom i:\[B_i = \frac{8 \pi^2}{3} \langle \Delta R_i^2 \rangle\]We can calculate the theoretical B-factors using the equation and GNM analysis, and the correlation with the experimental B-factors that are included in the PDB entry as a simple way to evaluate the GNM analysis. A study in 2009 by Lei Yang et al. compared the experimental and theoretical B-factors of 190 sufficiently different (<50% similarity) protein stuctures from X-ray and found the correlation to be about 0.58 on average 2. Below is a plot of the B-factor, synonomous to the mean-square fluctutation, of \(\alpha_1\). Residues with high values are those that fluctuate with greater motion or residues with greater positional uncertainty, and are colored red in the figures. In this case, we see that the residues colored in red are generally at the ends of secondary structures in the outer edges of the protein and loops (segments in between secondary structures). This is expected because protein loops typically contain highly fluctuating residues.(Top): Human hemoglobin colored according to the GNM calculated theoretical B-factors (left) and the experimental B-factors (right). (Bottom): 2D plot comparing the theoretical and experimental B-factors of subunit \(\alpha_1\) (chain A of the protein). \(\alpha_1\) is located at the top left quarter of the protein figure. A correlation coefficient of 0.63 was calculated between the theoretical and experimental B-factors.Slow ModesA benefit from decomposing the protein fluctuation into individual normal modes is that we are able to observe the characteristics of slow modes separately, i.e. which residues does it affect and to what degree, or slow mode shape. This is typically done by visualizing the modes as 2D plots where the x-axis is the residue sequence and the y-axis is the inverse eigenvalues of the Kirchhoff matrix. Peaks in the plot indicate which region of residues the mode describes, with higher peaks representing greater magnitude of motions. It is also common to observe the plot of the average of multiple modes to see the collective contribution of the modes. Below is an example of slow mode shape using human hemoglobin.(Top): Visualization of human hemoglobin colored based on GNM slow mode shape. Red represents regions of high mobility and correspond to peaks in the plot. The first image represents the slowest mode (left) and the second image represents the average of the first 10 slowest modes (right). (Bottom): 2D plot of the slowest mode separate by the four chains of hemoglobin.Similar to cross-correlation, analyzing slow mode shapes will give us insight on the structure of the protein and comparing the slow mode shapes can reveal differences between protein structures. From the shape of the slowest mode of all four chains (subunits), we can see that the shape for the four subunits of hemoglobin are quite similar. However, it is important to realize that the slowest mode only captures the largest movements of the protein. Therefore, we cannot say with certainty that the four subunits are as structurally similar as the slow mode shape, although from the cross-correlation map patterns and experimental studies, we know that subunit \(\alpha\) and subunit \(\beta\) are similar but have structural differences. As mentioned before, we can also view the average shape of the modes. Below is the slow mode plot of the slowest ten modes of hemoglobin. Here, we can see a stark difference between two groups of subunits/chains, where the \(\alpha\) subunits (chains A and C) share a very similar slow mode shape while the \(\beta\) subunits (chains B and D) share a different, yet similar, slow mode shape as well.The average mode shape of the slowest ten modes of human hemoglobin using GNM.There are two more commonly used plots used in mode analysis. The first is called the frequency dispersion of the modes, which is the plot of representing the frequency of each mode. The y-axis represents the reciprocal of the corresponding eigenvalue of the mode, where a higher value indicates a slow mode with low frequency, which are expected to be highly related to biological functions.The frequency dispersion of modes in human hemoglobin. Higher values indicates low frequency, slower modes that are likely to be highly relative to biological functions.The degree of collectivity is the measure of the extent of structural elements, in this case residues, that move together for each mode. The degree of collectivity of the \(k^{th}\) mode is calculated by the following equation:\[Collectivity_k = \frac{1}{N} e^{- \sum^N_i \Delta R_i^2 ln \Delta R_i^2}\]where N is the total number of residues. A high degree of collectivity indicates that the mode is highly cooperative and engages in a large portion of the structure. Low degree of collectivity indicates that the mode only affects a small region. Modes of high degree of collectivity are generally believed to be functionally relevant nodes and are usually found at the low frequency end of the mode spectrum.The degree of collectivitiy of modes in human hemoglobin. Higher values indicate modes that describe a large portion of the protein while low values indicate modes that describe small local regions.ANMThe anisotropic counterpart to GNM, in which the direction of fluctuations is also considered, is called anisotropic network model (ANM). The main difference in ANM analysis is that a Hessian matrix, \(H\), is used in place of the Kirchhoff matrix. Each element \(H_{ij}\) in the matrix is a 3x3 matrix that contain anisotropic information about the orientation of node i and node j. The calculations proceeds similarly to GNM, where eigen decomposition is used to calculate cross correlation and mean square fluctuations. Although ANM includes directionality, it typically performs worse than GNM when compared with experimental data 3. However, this model offers the benefit of creating animations depicting the range of motions and fluctuations of the protein because of the inclusion of orientation. We will not go in depth regarding the intricacies of ANM calculations in this module, but we will use ANM for the purpose of creating animations to visualize protein fluctuations. Below is the animation of hemoglobin showing the ANM calculated fluctuation. Here, we can clearly see a distinction in the direction of fluctuation between the left and right side of the protein, separated by immobile regions.Collective motions of the slowest mode in human hemoglobin from ANM calculations using DynOmics.Remember that hemoglobin is essentially a dimer of alpha-beta dimers. In this animation, the left and right side both represent an alpha-beta dimer. This separation in fluctuation supports our previous cross-correlation analysis, where the \(\alpha_1 \beta_1\)/\(\alpha_2 \beta_2\) interface is a dynamically variable (highly fluctuating) region due to cooperative interactions between \(\alpha\) and \(\beta\) subunits of the same dimer. The other interfaces, \(\alpha_1 \alpha_2\)/\(\beta_1 \beta_2\) and \(\alpha_1 \beta_2\)/\(\alpha_2 \beta_1\), are not dynamically variable regions because there is minimal interactions between subunits from differing dimers.The human hemoglobin exists in two states, the T (tense) state and the R (relaxed) state. The T state represents hemoglobin in its deoxy form, where it lacks an oxygen species. The R state represents the fully oxygenated form of hemoglobin. Our GNM/ANM results that pointed to the highly fluctuating, dynamically variable \(\alpha_1 \beta_1\)/\(\alpha_2 \beta_2\) interface is in accordance to an important biological function of hemoglobin, its ability to transition from the T state to R2 state. This state transtion involves the rearrangement of the interface, where salt-bridges and contacts can shift up to 7 Å 4. As such, the GNM/ANM analysis revealed that these interfaces are highly mobile and fluctuate greatly.This concludes the module’s introduction to the concepts behind GNM analysis. For those interested, a full treatment of the mathematics of GNMs can be found in the chapter at https://www.csb.pitt.edu/Faculty/bahar/publications/b14.pdf.Performing GNM calculations on proteins in the PDBIn the tutorial linked below, we will demonstrate how to easily perform GNM analysis on proteins in the Protein Data Bank. by using a web portal called DynOmics. DynOmics is made up of multiple components that allow us to perform GNM calculations and visualize the results into plots and figures like the ones seen in the previous section. These components include iGNM 2.0, a database of pre-computed GNM dynamics for all PDB structures, ANM 2.0, a server for visualizing animations created using ANM, and ENM 1.0, a server with a unifying user-friendly interface for performing all GNM/ANM calculations and evaluations. In the tutorial, we will perform GNM calculations on SARS-CoV-2 Spike protein and visualize the results.Visit tutorialGNM analysis of SARS-CoV-2 Spike ProteinIn the tutorial, we performed and visualized the GNM results of SARS-CoV-2 Spike protein. Here, we will analyze the slow mode shapes and cross-correlation heat map, and then compare it with the GNM results of SARS-CoV Spike protein (PDB: 5xlr).Slow Mode Shape of SARS-CoV-2 SpikeFirst, we will look at the average slow mode shape of the first ten slow modes. Recall in our hemoglobin example that peaks in the mode shape indicate regions of high flexibility/fluctuation. Below is the slow mode shape and visualization of the two Spike proteins, using the colors red for high flexibility and blue for low flexibility.Average mode shape of the slowest ten modes of SARS-CoV-2 Spike (left) and SARS-CoV Spike (right). The first peak corresponds to the N-Terminal Domain (NTD) and the second peak corresponds to the Receptor Binding Domain (RBD).The results show that the NTD and RBD of SARS-CoV-2 Spike protein are highly flexible, which agrees with the biological functions of these regions. As we have learned, the RBD is responsible for the interaction with ACE2 on human cells. During this interaction, the RBD of one of the three chains “opens” up, exposing itself to more easily bind with ACE2. Therefore, the flexibility in this region makes sense. The other peak corresponds the NTD of the Spike protein. Similar to the RBD, the NTD of the Spike proteins also mediates viral infection, but by interacting with DC-SIGN L-SIGN receptors rather than ACE2 5. DC-SIGN (dendritic cell-specific intracellular adjesion molecules ICAM-r grabbing non-integrin) and L-SIGN (liver/lymph node-specific ICAM-3 grabbing non-integrin) are closely related C-type lectin that are present on macrophages and dendritic cells. This allows SARS-CoV-2 to infect different tissues such as the lungs, where ACE2 expression levels are low. The ability to infect lung cells attributes to pneumonia, the main symptom of severe COVID-19 cases. Just like the RBD, high flexibility in this domain allows the Spike protein to more easily come into contact and interact with these receptors. Comparing with the slow-mode shape of SARS-CoV Spike protein, we see that the RBD peak is slightly lower in SARS-CoV-2 Spike, but the overall slow mode shape is very similar. This supports the similarity between the two proteins in both function and structure.Cross-Correlation of SARS-CoV-2 SpikeNext, we will look at the cross-correlation heat map of SARS-CoV-2 Spike and SARS-CoV Spike.Cross-correlation heat map of SARS-CoV-2 Spike (left) and SARS-CoV Spike (right). Along the central diagonal, three identical substructures (boxed in orange) representing the three identical chains of the Spike protein can be seen. There are three regions of high correlation within each chain, representing the NTD, RBD, and the S1 domain. Throughout the heat map, the correlation patterns are shared between the two proteins (boxed in blue).Following the central diagonal, we see three identical substructures within each protein. Remember that the Spike proteins are trimers of identical chains (boxed in orange). Within each chain, we see three distinct regions of highly positive correlation: the NTD, RBD, and S1 domain. The protein structure of the Spike protein can be separated into two main parts, the S1 and S2 domain. The S1 domain includes the NTD and RBD, and is largely responsible for receptor interactions. The S1 protein includes the ‘stalk’ of the spike protein, which is largely responsible for the membrane fusion during infection. Between the two proteins, we see essentially the same three identical substructures along the central diagonal. In addition, we see that the correlation patterns in the off-diagonal regions (boxed in blue) are shared between the two proteins. Taken together, the results of the cross-correlation indicate that the SARS-CoV-2 and SARS-CoV Spike are highly similar in structure and likely have the same or similar function.Summing UpIn this module, we have discussed a great deal of computational methods surrounding the analysis of proteins. We began with a discussion of the fundamental problem of determining a protein’s structure. Because experimental methods for identifying protein structure are costly and time consuming, we transitioned to discuss algorithmic approaches that do a good job of predicting a protein’s structure from its sequence of amino acids.We then transitioned to the problem of comparing structures for related proteins, with a lengthy case study on comparing the SARS-CoV and SARS-CoV-2 spike protein structures. We saw that the problem of quantifying the “difference” between two shapes is more challenging than it might seem, and we established both global and local structure comparison metrics. We applied these approaches to isolate three candidate regions of the SARS-CoV-2 spike protein that seem to be bound better to the ACE2 enzyme, and we quantified this binding using a localized energy function.We then saw that to infer a protein’s function, we need to move from studying structure to molecular dynamics, studying how the protein behaves within its environment as it flexes and bends in order to interact with other molecules.This is a great deal of ground to have covered, but if we would like to present an ultimate moral to this chapter, it is that biology is an extremely complex subject. The structure prediction problem is decades old and still not fully solved, and computational approaches for studying protein structure and dynamics are sophisticated. But there is just as much that we have left undiscussed. What happens after the spike protein binds to ACE2? How does the virus enter the cell? How does it replicate itself? How does it fight our immune systems, and how can we design a vaccine to fight back? We would need far more time than we have here to treat all of these topics, but ifyou are interested in an online course covering some of them, then check out the free online course SARS Wars: A New Hope by our colleague Christopher James Langmead.Mutations and the emergence of new strainsOne of the main characteristics of life is the ability to reproduce. Of course, this includes the replication of genetic material, whether it be DNA or RNA. However, the replication process is not completely error-proof and can the biological machinery can make mistakes. These changes in the genetic material are called mutations and are the driving force in evolution. These mutations are often harmful to the organism or can have little to no effect. On rare occasions, the mutations can enhance the organism and allow it to outcompete members of the same species and pass down the positive mutation to its offsprings. As time passes, more and more members of the species will have accumulated mutations and may eventually be considered a new species or variant of the species depending on how much the genetic material has changed. Although scientists are still debating over whether viruses are alive, they are still involved in genetic replication, albiet hijacking the host’s biological machinery. Nonetheless, the constant replication of viruses often lead to mutations and creation of new strains or variants of the virus. Why else do we need annual flu shots?With the widespread rate of infection of COVID-19, it is inevitable for mutations to occur and create variants of the virus. In fact, there are already multiple strains that are circulating globally. The more well-known variants as of January 2021, are variant B.1.1.7 in the United Kingdom, variant 1.351 in South Africa, and variant P.1 in Brazil. From observations, it appears that these new variants are more infectious and can spread more easily 6. However, there are still much in the unknown.There are important questions that need to be answered: How far have the variants spread? How do they differ from current variants? How do their infectivity and severity differ? How will they respond to current vaccines and treatment?As COVID-19 continues to circulate, new variants will continue to emerge, meaning that this is still an active area of study.Thus concludes the third module of this course. In the course’s final module, we will turn our attention to a very different type of problem. To fight a virus like SARS, your body employs a cavalry of white blood cells. Maintaining healthy levels of these cells is vital to a strong immune system, and blood reports run counts of these cells to ensure they are within normal ranges. Can we teach a computer to run this analysis automatically?We hope you will join us to find out! Garrett, R. H., Grisham, C. M., 2010. Biochemistry, 4th ed. Brooks/Cole, Cengage Learning. ↩ Yang, L., Song, G., & Jernigan, R. L. 2009. Comparisons of experimental and computed protein anisotropic temperature factors. Proteins, 76(1), 164–175. https://doi.org/10.1002/prot.22328 ↩ Yang, L., Song, G., Jernigan, R. 2009. Protein elastic network models and the ranges of cooperativity. PNAS 106(30), 12347-12352. https://doi.org/10.1073/pnas.0902159106 ↩ Davis, M., Tobi, D. 2014. Multiple Gaussian network modes alignments reveals dynamically variable regions: The hemoglobin case. Proteins: Structure, Function, and Bioinformatics, 82(9), 2097-2105. https://doi-org.cmu.idm.oclc.org/10.1002/prot.24565 ↩ Soh, W. T., Liu, Y., Nakayama, E. E., Ono, C., Torii, S., Nakagami, H., Matsuura, Y., Shioda, T., Arase, H. The N-terminal domain of spike glycoprotein mediates SARS-CoV-2 infection by associating with L-SIGN and DC-SIGN. ↩ New COVID-19 Variants. 2021. Retrieved January 27, 2021, from https://www.cdc.gov/coronavirus/2019-ncov/transmission/variant.html ↩ "
} ,
{
"title" : "Contact Us",
"category" : "",
"tags" : "",
"url" : "/contact/",
"date" : "",
"content" : "Work on this project is ongoing. If you have any questions about this project, or to report typos/bugs, please use the form below.We also would be happy to hear from you if you are a learner who is interested in providing a testimonial about how this course has been useful to you.Finally, if you are an instructor who is interested in adopting this course in your own teaching, whether in full or in individual pieces, please let us know as we are forming a network of instructors adopting this course.We look forward to hearing from you!Contact Form Your Name Email Address Type Your Message "
} ,
{
"title" : "Contents",
"category" : "",
"tags" : "",
"url" : "/contents/",
"date" : "",
"content" : "This online course is divided into modules. Each module covers a single biological topic (e.g., “Analyzing the structure of the coronavirus spike protein”) and introduces all of the modeling topics needed to address that topic from first principles.Each module has a main narrative that can be explored by anyone, including beginners. When we need to build a model along the way, we pass our modeling work to “software tutorials” that show how to use high-powered modeling software produced by MMBioS researchers in order to build biological models. The software tutorials allow users wishing to get their hands dirty with modeling software to build all of the models that we need in this course. This allows for a course that can be explored by both casual and serious biological modeling learners alike.After building a model in a software tutorial, we return to the main text and analyze the results of this model. In this way, the text forms a constant interplay between establishing a biological problem, describing how a model work, implementing that model in a software tutorial, and returning to the text to analyze the model and ask our next question, beginning the cycle anew.The following contents is a work in progress and will be expanded in the coming weeks. For now, you can find links to the start of each published module below.Prologue: An introduction to biological modeling via random walks and Turing patternsMain text Introduction: Life is random Alan Turing and the zebra’s stripes An introduction to random walks A reaction-diffusion model generating Turing patterns The Gray-Scott model: a Turing pattern cellular automaton Conclusion: Turing patterns are fine-tuned Software tutorials (featuring MCell and CellBlender) Simulating particle diffusion with CellBlender Generating Turing patterns with a reaction-diffusion simulation in CellBlender Building a diffusion cellular automaton with Jupyter notebook Implementing the Gray-Scott reaction-diffusion automaton with Jupyter notebook Module 1: Finding motifs in transcription factor networksMain text Introduction: Networks rule biology Transcription and DNA-protein binding Transcription factor networks Using randomness to verify network motifs The negative autoregulation motif The feedforward loop motif Building a biological oscillator Conclusion: the importance of robustness in biological oscillations Software tutorials (featuring MCell and CellBlender) Hunting for loops in transcription factor networks Comparing simple regulation to negative autoregulation Ensuring a mathematically controlled simulation for comparing simple regulation to negative autoregulation Implementing the feed-forward loop motif Implementing the repressilator: a biological oscillator Perturbing the repressilator Module 2: Unpacking E. coli’s genius exploration algorithmMain text Introduction: The lost immortals Bacterial runs and tumbles Signaling and ligand-receptor dynamics Stochastic simulation of chemical reactions A biochemically accurate model of bacterial chemotaxis Methylation helps a bacterium adapt to differing concentrations Modeling a bacterium’s response to an attractant gradient Conclusion: the beauty of E. coli’s robust randomized exploration algorithm Software tutorials (featuring BioNetGen) Getting started wtih BioNetGen and modeling ligand-receptor dynamics Adding phosphorylation to our BioNetGen model Modeling bacterial adaptation to changing attractant Traveling up an attractant gradient Traveling down an attractant gradient Modeling a pure random walk strategy Modeling E. coli’s sophisticated random walk algorithm Comparing different chemotaxis default tumbling frequencies Module 3: Analyzing the coronavirus spike protein Introduction: A tale of two doctorsPart 1: Protein structure prediction An introduction to protein structure prediction Ab initio protein structure prediction Homology modeling for protein structure prediction Comparing protein structures to assess model accuracy Part 1 conclusion: protein structure prediction is solved! (Kinda…) Part 2: Comparing SARS-CoV-2 and SARS-CoV Searching for local differences in the SARS-CoV and SARS-CoV-2 spike proteins Analyzing structural differences in the bonding of SARS-CoV and SARS-CoV-2 with the ACE2 enzyme Quantifying the interaction energy between the SARS-CoV-2 spike protein and ACE2 Part 2 conclusion: from static protein analysis to molecular dynamics Software tutorials (featuring ProDy) Using ab initio modeling to predict the structure of hemoglobin subunit alpha Using homology modeling to predict the structure of the SARS-CoV-2 spike protein Using RMSD to compare the predicted SARS-CoV-2 spike protein against its experimentally validated structure Finding local differences in the SARS-CoV and SARS-CoV-2 spike protein structures Visualizing specific regions of interest within the spike protein structure Computing the energy contributed by a local region of the SARS-CoV-2 spike protein bound with the human ACE2 enzyme Molecular dynamics analysis of coronavirus spike proteins using GNM Adding directionality to spike protein GNM simulations using ANM Module 4: Training a computer to count white blood cells automaticallyComing soon!Featured software: CellOrganizer"
} ,
{
"title" : "Exercises: Coming Soon!",
"category" : "",
"tags" : "",
"url" : "/coronavirus/exercises",
"date" : "",
"content" : " Good exercise: find centroid of a given shape. exercise: compute RMSD. Good exercise later: compute Q scores for the protein structure comparison that we performed at the end of part 1. Good exercise: compute Qres for very simple proteins Exercise based on following excellent observation: In the case of RMSD, I believe that they assign the RMSD of an alignment between a residue and gap to be 0, effectively ignoring it. I believe this is how it is ensured that the two sets have the same number of points (alpha carbons) and also one of the shortcomings of using RMSD. I think for prody, you can set a gap penalty during chain matching, “Thus, gap-filled alignments focusing on low RMSDs, while accurate and useful for superposition of structures, are sub-optimal for machine learning as the features of many potentially relevant residues are discarded due to a lack of data in those positions. In most cases, positions with over a certain percentage of aligned residues are considered, with gaps replaced by zeros or by the average of the feature values in that position [22].” (RMSD with gap. Also included a jupyter notebook of this exercise with the necessary files to the email.) In this exercise, we will see what can happen to RMSD calculations when there is a gap in sequence alignment between two proteins. Let’s use our homology modeling result robetta4 (single chain of SARS-CoV-2 Spike) and the associated SARS-CoV-2 Spike model 6vxx from the PDB.First, calculate the RMSD between the two models by following the RMSD tutorial and using the chain A to chain A matching (matches[0][0] & matches[0][1]). You should get a RMSD of about 2.5853.If you followed the tutorial, robetta4 should be under the variable struct1. We will create a new variable, struct3, by taking the sequence of robetta4 and deleting a large selection. We can create the variable by using:struct3 = struct1.select(‘resid 1 to 400 or resid 601 to 20000’)We use a large value ‘20000’ to ensure that the rest of the protein is captured. Variable struct3 will represent the robetta4 model that has a gap/deletion at residue 400 to 600 (a 200 residue gap). Now we will repeat the RMSD calculation using struct3 instead of struct1. You should get a RMSD of about 2.1927. Is this what you expected?(There are less residues to compare and deviations to consider, which may have attributed to the decreased RMSD score.) Why are contact maps and cross correlation maps “symmetric” about the main diagonal? Something on identifying a dynamics difference from a contact map or better cross-correlation in similar proteins. If you have not already done so, try modeling the SARS-CoV-2 S protein or RBD using SWISS-MODEL, Robetta, or GalaxyWEB using the steps inHomology Structure Prediction Tutorial. Then, use ProDy to calculate the RMSD between your models and the PDB entries 6vxx for the S protein and 6lzg for the RBD. Did your models perform better than our models? Visualize your best performing model(s) and the corresponding PDB entryl in VMD. If the models are sufficiently similar, try performing a structural alignment using Multiseq and see where in the the structure your predicted models did well. Using VMD, model the SARS-CoV-2 S (6vxx) protein and SARS S (5x58) protein. Create the graphical representation of glycans and compare the number of glycans between the two proteins. Are they any different? Could this possibly be another reason why SARS-CoV-2 is more infectious than SARS? In our GNM tutorial, we created the contact map using the threshold of 20Å. Try making the contact map of one of the chains of SARS-CoV-2 S protein 6vxx with different thresholds. Do the maps look different? In this module, we only used homology modeling for large molecules such as the SARS-CoV-2 S protein and the RBD. It would be interesting to directly compare the accuracy of homology modeling and ab initio modeling. Try using one of the three homology modeling softwares to predict the structure of the human hemoglobin subunit (sequence). After you get your predicted models, try calculating the RMSD using the PDB entry 1si4. How do they compare to the RMSD from our ab inito (QUARK) models? "
} ,
{
"title" : "Extra",
"category" : "",
"tags" : "",
"url" : "/coronavirus/extra",
"date" : "",
"content" : "Part 1 Earlier, point out that no videos exist on protein folding on YouTube because this happens in 1/100 to 1/1000th of a second at a molecular level, so it has not been observed. Earlier, fix issue that RMSD of a short protein is smaller. Better: a single aa can disrupt it, so less likely to have minuscule RMSD with a long protein. Protein shape determines binding affinity (from structure intro but possibly just delete this)Now that we understand the importance of shape in determining how proteins interact with molecules in their environment, we will spend some time discussing how these interactions are modeled.The simplest model of protein interactions is Emil Fischer’s lock and key model, which dates back to 1894 [^Fischer]. This model considers a protein that is an enzyme, which serves as a catalyst for a reaction involving another molecule called a substrate, and we think of the substrate as a key fitting into the enzyme lock. If the substrate does not fit into the active site of an enzyme, then the reaction will not occur.However, proteins are flexible, a fact that we will return to when we discuss the binding of the coronavirus spike protein to a human enzyme in a later lesson. Because of this flexibility, Daniel Koshland introduced a modified model called the induced fit model in 1958.[^Koshland] In this model, the enzyme and substrate may not fit perfectly, nor are they rigid like a lock and key. Rather, the two molecules may fit inexactly, changing shape as they bind to mold together more tightly. That having been said, if an enzyme and substrate’s shape do not match well with each other, then they will not bind. For an overview of the induced fit model, please check out this excellent video from Khan Academy.Extra ab initio Models published before crystallography can be found here: SSGCID Models From Chris: For more information of how to calculate the energies and the functions for potential energy, click here. Extra – threading We need to make sure that the specification of the .pdb file type comes back somewhere before we give the results. It may be a perfect place to do so in this lesson. Perhaps something about how threading works. Fact is that even if a protein doesn’t have a homologous protein in a database, most proteins will still have a protein of very similar structure. Unfortunately, there are occasions where no identified proteins have notable sequence similarities. The alternative is to use threading, or fold recognition. In this case, rather than comparing the target sequence to sequences in the database, this method compares the target sequence to structures themselves. The biological basis of this method is that in nature, protein structures tend to be highly conservative and unique structural folds are therefore limited. A simple explanation of the general threading algorithm is that structure predictions are created by placing or “threading” each amino acid in the target sequence to template structures from a non-redundant template database, and then assessing how well it fits with some scoring function[^score]. Then, the best-fit templates are used to build the predicted model. The scoring function varies per algorithm, but it typically takes secondary structure compatibilities, gap penalties during alignment, and other terms that depend on amino acids that are bought into contact by the alignment. Each software has its own algorithms and method of assembly, such as how to decide which templates to use, how to use the templates, and how to fill in blurry areas (no good matches with templates). Nevertheless, the three softwares essentially build the models by assembling varying fragments from templates. If you would like to learn more about the intricacies of each software, you can follow these linkes: Robetta, Galaxy, SWISS-MODEL. "
} ,
{
"title" : "Glycans",
"category" : "",
"tags" : "",
"url" : "/coronavirus/glycans",
"date" : "",
"content" : "The surface of viruses and host cells are not smooth, but rather “fuzzy”. This is because the surface is decorated by structures called glycans, which consists of numerous monosaccharides linked together by glycosidic bonds. Although this definition is also shared with polysaccharides, glycans typically refer to the carbohydrate portion of glycoproteins, glycolipids, or proteoglycans 1. Glycans have been found to have structural and modulatory properties and are crucial in recognition events, most commonly by glycan-binding proteins (GBPs) 2. In viral pathogenesis, glycans on host cells act as primary receptors, co-receptors, or attachment factors that are recognized by viral glycoproteins for viral attachment and entry. On the other hand, glycans on viral surfaces are key for viral recognition by the host immune system 3. Unfortunately, some viruses have evolved methods that allow them to effectively conceal themselves from the immune system. One such method, which is utilized by SARS-CoV-2, is a “glycan shield”, where glycosylation of surface antigens allows the virus to hide from antibody detection. Another notorious virus that utilizes glycan shielding is HIV. The Spike protein of SARS-CoV-2 was also found to be heavily glycosylated, shielding around 40% of the Spike protein from antibody recognition 4. Such glycosylation does not hinder the Spike protein’s ability to interact with human ACE2 because the Spike protein is able to adopt an open conformation, allowing a large portion of the RBD being exposed.Glycans are generally very flexible and have large internal motions that makes it difficult to get an accurate description of their 3D shapes. Fortunately, molecular dynamics (MD) simulations can be employed to predict the motions and shapes of the glycans. With a combination of MD and visualization tools (i.e. VMD), snapshots of the Spike protein with its glycosylation can be created.Nonetheless, basic visualizations of the Spike protein with its glycans can be made using just VMD. Here, we used SARS-CoV-2 Spike in its closed conformation (6vxx)) and SARS-CoV-2 Spike in its open conformation (6vyb) to create the following images. Notice how the RBD in the orange chain is much more exposed in the open conformation. The presumed glycans are shown in red.To see how to visualize glycans in VMD, go to the following tutorial.Visit tutorialNext lessonGlycansThe surface of viruses and host cells are not smooth, but rather “fuzzy”. This is because the surface is decorated by structures called glycans, which consists of numerous monosaccharides linked together by glycosidic bonds. Although this definition is also shared with polysaccharides, glycans typically refer to the carbohydrate portion of glycoproteins, glycolipids, or proteoglycans [^Dwek]. Glycans have been found to have structural and modulatory properties and are crucial in recognition events, most commonly by glycan-binding proteins (GBPs) [^Varki]. In viral pathogenesis, glycans on host cells act as primary receptors, co-receptors, or attachment factors that are recognized by viral glycoproteins for viral attachment and entry. On the other hand, the immune system can recognize foreign glycans on viral surfaces and target the virus [^Raman]. Unfortunately, some viruses have evolved methods that allow them to effectively conceal themselves from the immune system. One such method is a glycan shield. By covering the viral surface and proteins with glycans, the virus can physically shield itself from antibody detection. Because the virus replicates by hijacking the host cells, the glycan shield can consist of host glycans and mimic the surface of a host cell. A notorious virus that utilizes glycan shielding is HIV. In the case of SARS-CoV-2, the immune system recognizes the virus through specific areas, or antigens, along the S protein. The S protein, however, is a glycoprotein, meaning that it is covered with glycans which can shield the S protein antigens from being recognized.In our last tutorial, we will use VMD to try to visualize the glycans of SARS-CoV-2 S protein.Visit tutorialFrom the visualization we created in the tutorial, we can see that glycans are present all around the S protein. In fact, the glycans cover around 40% of the Spike protein[^Grant]! This raises an important question: If the glycans on the S protein can hide from antibodies, won’t it get in the way of binding with ACE2? Such glycosylation does not hinder the Spike protein’s ability to interact with human ACE2 because the Spike protein is able to adopt an open conformation, allowing a large portion of the RBD being exposed. In the figure below, we compared the SARS-CoV-2 Spike in its closed conformation (PDB entry: 6vxx)) and SARS-CoV-2 Spike in its open conformation (PDB entry: 6vyb). The presumed glycans are shown in red. Notice how the RBD in the orange chain is much more exposed in the open conformation.This figure shows the SARS-CoV-2 S protein in the closed conformation (left) and the protein with an open conformation of one chain (right) using the PDB entries 6vxx and 6vyb, respectively. The protein chains are shown in dark orange, yellow, and green. The presumed glycans are shown in red. Notice how in the open conformation, the RBD of one of the chain is pointed upwards, exposing it for ACE2 interactions.Glycans are generally very flexible and have large internal motions that makes it difficult to get an accurate description of their 3D shapes. Fortunately, molecular dynamics (MD) simulations can be employed to predict the motions and shapes of the glycans. With a combination of MD and visualization tools (i.e. VMD), very nice looking snapshots of the glycans on the S protein can be created.Snapshots from molecular dynamics simulations of the SARS-CoV-2 S protein with different glycans shown in green, yellow, orange, and pink. Source: https://doi.org/10.1101/2020.04.07.030445 [^Grant]SARS-CoV-2 VaccineMuch of vaccine development for SARS-CoV-2 has been focused on the S protein given how it facillitates the viral entry into host cells. In vaccine development, it is critical to understand every strategy that the virus employs to evade immune response. As we have discussed, SARS-CoV-2 hides its S protein from antibody recognition through glycosylation, creating a glycan shield around the protein. In fact, the “stalk” of the S protein has been found to be completely shielded from antibodies and other large molecules. In contrast, the “head” of the S protein is vulnerable because of the RBD is less glycosylated and becomes fully exposed in the open conformation. Thus, there is an opportunity to design small molecules that target the head of the protein [^Casalino]. Glycan profiling of SARS-CoV-2 is extremely important in guiding vaccine development as well as improving COVID-19 antigen testing [^Watanabe].Sources Dwek, R.A. Glycobiology: Toward Understanding the Function of Sugars. Chem. Rev. 96(2), 683-720 (1996). https://pubs.acs.org/doi/10.1021/cr940283b ↩ Varki A, Lowe JB. Biological Roles of Glycans. In: Varki A, Cummings RD, Esko JD, et al., editors. Essentials of Glycobiology. 2nd edition. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; 2009. Chapter 6. https://www.ncbi.nlm.nih.gov/books/NBK1897/ ↩ Raman, R., Tharakaraman, K., Sasisekharan, V., & Sasisekharan, R. Glycan-protein interactions in viral pathogenesis. Current opinion in structural biology, 40, 153–162 (2016). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5526076/ ↩ Grant, O. C., Montgomery, D., Ito, K., & Woods, R. J. Analysis of the SARS-CoV-2 spike protein glycan shield: implications for immune recognition. bioRxiv : the preprint server for biology, 2020.04.07.030445. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7217288/ ↩ "
} ,
{
"title" : "Introduction: How Are Blood Cells Counted?",
"category" : "",
"tags" : "",
"url" : "/white_blood_cells/home",
"date" : "",
"content" : "by Phillip Compeau, with software tutorials by Nicole MatamalaYour doctor sometimes wants to count your blood cells to ensure that they are within healthy ranges as part of a complete blood count. Blood cells are divided into red blood cells (RBCs), which transport oxygen via a molecule called hemoglobin, and white blood cells (WBCs), immune system cells that help to identify and attack foreign cells.The classic device for counting blood cells is the hemocytometer. As illustrated in the video below, after a technicial filters a tiny amount of blood onto a gridded slide, they then count the number of cells of a given type in squares on the grid. This number can then be multiplied to give an estimation of the expected number of cells within a larger volume of blood.STOP: What problems can you imagine might happen with using a cytometer to estimate blood cell counts?The hemocytometer is a simple, even elegant device, but you would not be wrong if you think it seems a bit old-fashioned. In fact, it was invented by Louis-Charles Malassez 150 years ago. Can we hope for a more modern approach in which we train a computer to count blood cells?In this module, we will focus on identifying WBCs in cellular images, which can provide a great deal of information to scientists counting these cells. A low WBC count may indicate a host of diseases that leave the immune system susceptible to attack; a high WBC count may indicate that an infection is present, or that a disease like leukemia has caused overproduction of WBCs.WBCs further divide into subclasses based on their structure and function, and some other diseases may cause an abnormally low or high count of a specific subclass of WBCs. We therefore to not only identify WBCs in cellular images but also classify these WBCs into their appropriate types.We will work with a dataset containing blood cell images depicting both RBCs and WBCs. As shown in the figure below, these images contain the three main families of WBCs: granulocytes, lymphocytes, and monocytes. Granulocytes have a multilobular nucleus, which consists of several round “lobes” that are linked by thin strands of nuclear material. Monocyte and lymphocyte nuclei only have a single lobe, but the resulting shapes of the nuclei are quite different: lymphocyte nuclei tend to have a more rounded shape (taking up a greater fraction of the cell’s volume), whereas monocyte nuclei have a more irregular shape. Granulocyte Monocyte Lymphocyte Three images from the blood cell image dataset showing three types of WBCs. (Left) A specific subtype of granulocyte called a neutrophil, illustrating the multilobular structure of this WBC family. (Center) A monocyte with a single, irregularly-shaped nucleus. (Right) A lymphocyte with a round nucleus. (In the provided dataset, these cells correspond to image IDs 3, 15, and 20, respectively.)Our goal is twofold: first, can we excise the WBCs from the images? Second, can we train a computer to classify these WBCs by family? To perform these tasks, we will enlist CellOrganizer, a powerful software resource that can perform automated analyses on cellular images.When you look at the cells in the figure above, you may think that our tasks will be easy. After all, identifying WBCs is simply a matter of excising the large purplish regions. our eyes are the result of billions of years of evolution to be able to identify patterns and differentiate objects. For that reason, we will see that training a computer to “see” these images in order to separate and classify WBCs is trickier than you might think.Next lesson"
} ,
{
"title" : "Introduction: The Lost Immortals",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home",
"date" : "",
"content" : "by Shuanger Li and Phillip CompeauThe book What If?1, by Randall Munroe, compiles a collection of crazy scientific hypotheticals, paired with thorough discussions of what might happen if these situations occurred. Here is an example, called “Lost Immortals”. If two immortal people were placed on opposite sides of an uninhabited Earth-like planet, how long would it take them to find each other? 100,000 years? 1,000,000 years?One could imagine many ideas for how the immortals might find each other. For example, they could avoid the interiors of continents by moving to the coastlines. If they are allowed to discuss how to find each other in advance, then they could just agree to meet at the planet’s North Pole — assuming that the planet lacks polar bears.But Munroe provides a solution to Lost Immortals that is both sophisticated and elegant. His proposed approach is quoted below. If you have no information, walk at random, leaving a trail of stone markers, each one pointing to the next. For every day that you walk, rest for three. Periodically mark the date alongside the cairn. It doesn’t matter how you do this, as long as it’s consistent. You could chisel the number of days into a rock, or lay out rocks to plot the number. If you come across a trail that’s newer than any you’ve seen before, start following it as fast as you can. If you lose the trail and can’t recover it, resume leaving your own trail. You don’t have to come across the other player’s current location; you simply have to come across a location where they’ve been. You can still chase one another in circles, but as long as you move more quickly when you’re following a trail than when you’re leaving one, you’ll find each other in a matter of years or decades. And if your partner isn’t cooperating—perhaps they’re just sitting where they started and waiting for you—then you’ll get to see some neat stuff.You may be wondering what Lost Immortals has to do with biological modeling. In the previous two modules, we have already seen the power of randomness to provide answers to practical questions. Lost Immortals offers another benefit of randomness in the form of a randomized algorithm, or a method that employs randomness to solve a problem.In fact, Munroe’s randomized algorithm for Lost Immortals is inspired by nature; he calls the above approach “be an ant” because it mimics how ants explore their environment for resources. However, in this module, we will see that this algorithm is also similar to the method of exploration undertaken by a much smaller organism: our old friend E. coli.Like other prokaryotes, E. coli is tiny, with a rod-shaped body that is 2µm long and 0.25 to 1µm wide.2 In exploring a vast world with sparse resources, E. coli finds itself in a situation comparable to the Lost Immortals.The video below shows a collection of E. coli surrounding a sugar crystal. Think of this video the next time you leave a slice of cake out on the kitchen counter!The movement of organisms like the bacteria in the above video in response to a chemical stimulus is called chemotaxis. E. coli and other bacteria have evolved to move toward attractants (e.g., glucose, electron acceptors) and away from repellents (e.g., Ni2+, Co2+). But how?In this module, we will dive into the chemotaxis process and ask a number of questions. How does a simple organism like E. coli sense an attractant or repellent in its environment? How does the bacterium change its internal state in response to this environment? And how does the internal response to a stimulus translate into an interpretable “algorithm” that the wandering E. coli implements to explore its environment?Next lesson Randall Munroe. What If? Available online ↩ Pierucci O. 1978. Dimensions of Escherichia coli at various growth rates: Model of envelope growth. Journal of Bacteriology 135(2):559-574. Available online ↩ "
} ,
{
"title" : "Introduction: A Tale of Two Doctors",
"category" : "",
"tags" : "",
"url" : "/coronavirus/home",
"date" : "",
"content" : "by Chris Lee and Phillip Compeau One of the world’s most important warning systems for a deadly new outbreak is a doctor’s or nurse’s recognition that some new disease is emerging and then sounding the alarm. It takes intelligence and courage to step up and say something like that, even in the best of circumstances. Tom Inglesby 1, Director of the Center for Health Security at Johns Hopkins Bloomberg School of Public HealthThe world’s fastest outbreakOn February 21, 2003, a Chinese doctor named Liu Jianlun flew to Hong Kong to attend a wedding and checked into Room 911 of the Metropole Hotel. The next day, he became too ill to attend the wedding and was admitted to a hospital. Two weeks later, Dr. Liu was dead.On his deathbed, Dr. Liu stated that he had recently treated sick patients in Guangdong Province, China, where a highly contagious respiratory illness had infected hundreds of people. The Chinese government had made brief mention of this incident to the World Health Organization but had concluded that the likely culprit was a common bacterial infection. By the time anyone realized the severity of the disease, it was already too late to stop the outbreak. On February 23, a man who had stayed across the hall from Dr. Liu at the Metropole traveled to Hanoi and died after infecting 80 people. On February 26, a woman checked out of the Metropole, traveled back to Toronto, and died after initiating an outbreak there. On March 1, a third guest was admitted to a hospital in Singapore, where sixteen additional cases of the illness arose within two weeks.23.Consider that it took four years for the Black Death, which killed over a third of all Europeans in the 14th Century, to travel from Constantinople to Kiev. Or that HIV took two decades to circle the globe. In contrast, this mysterious new disease had crossed the Pacific Ocean within a week of entering Hong Kong.As health officials braced for the impact of the fastest-traveling virus in human history, panic set in. Businesses were closed, sick passengers were removed from airplanes, and Chinese officials threatened to execute infected patients who violated quarantine. In the process, the mysterious new disease earned a name: Severe Acute Respiratory Syndrome, or SARS.Finding the source of the outbreakSARS was deadly, killing close to 10% of those who became sick.4 But it also struggled to spread much farther within the human population, and it was contained in July 2003 with fewer than 10,000 confirmed symptomatic cases worldwide.Scientists initially thought that humans had contracted SARS from palm civets, which are native to Guangdong. But research would later show that the disease likely originated in bats, a notorious disease carrier.5In 2017, researchers published the result of five years of sampling horseshoe bats from a cave in Yunnan province. They found that the bats harbored coronaviruses with remarkable genetic similarity to SARS, and they hypothesized that the virus may have come from horseshoe bats. Yet their work has become infamous because they identified additional viruses in the bats that were related to SARS but just as capable of entering human cells. Their words are now chilling:6 We have also revealed that various [viruses] … are still circulating among bats in this region. Thus, the risk of spillover into people and emergence of a disease similar to SARS is possible. This is particularly important given that the nearest village to the bat cave we surveyed is only 1.1 km away, which indicates a potential risk of exposure to bats for the local residents. Thus, we propose that monitoring of SARSr-CoV evolution at this and other sites should continue, as well as examination of human behavioral risk for infection and serological surveys of people, to determine if spillover is already occurring at these sites and to design intervention strategies to avoid future disease emergence.A new threat emergesOn December 30, 2019, a Chinese ophthalmologist named Li Wenliang sent a WeChat message to fellow doctors at Wuhan Central Hospital, warning them that he had seen several patients with symptoms resembling SARS 1. He urged his colleagues to wear protective clothing and masks to shield them from this new potential threat.The next day, a screenshot of his post was leaked online, and local police summoned Dr. Li and forced him to sign a statement that he had “severely disturbed public order”. He then returned to work, treating patients in the same Wuhan hospital.Meanwhile, the World Health Organization (WHO) received reports regarding multiple pneumonia cases from the Wuhan Municipal Health Commission and activated a support team to assess the new disease. The WHO declared on January 14 that local authorities had seen “no clear evidence of human-to-human transmission of the novel coronavirus”. By this point, it was now too late.Throughout January, the virus silently raged through China, spreading to both South Korea and the United States as Lunar New Year celebrations took place within the country. By the end of the month, the disease was in 19 countries, as shown below.The number of reported confirmed cases of 2019-nCoV (COVID-19) as of January 30th, 2020. Figure courtesy World Health Organization 7.Within the next two months, the disease exploded across the planet, becoming a pandemic and earning a name in the process: Coronavirus disease 2019 (COVID-19).As for Dr. Li? Despite warning against the risk of this new virus, he contracted the disease from one of his patients on January 8. He continued working in the hospital, and entered the hospital on January 31. Within a week, he was dead, one of the first of millions of COVID-19 casualties.Why were the two outbreaks so different?The similarity between SARS and COVID extends well beyond their symptoms. The viruses causing these diseases, whose respective names are SARS coronavirus (SARS-CoV) and SARS coronavirus 2 (SARS-CoV-2) are both coronaviruses, which means that their outer membranes are covered in a layer of spike proteins that cause them to look like the sun’s corona during an eclipse (see figure below). In fact, if we look at the two viruses under a microscope, they look virtually identical.Coronviruses as seen under a microscope. The fuzzy blobs on the cell surface are spike proteins, which the virus uses to gain entry to host cells. Figure courtesy F. Murphy and S. Whitfield, CDC8.Both viruses not only look similar, they also use the same mechanism to infect human cells, when the spike protein on the virus surface bonds to the ACE2 enzyme on a human cell’s membrane.910 So why did SARS fizzle, but SARS-CoV-2, a disease that is on average less harmful1112 and less deadly to individuals, transform into an uncontrollable pandemic? The most likely explanation for the ability of SARS-CoV-2 to spread across far more countries and remain a public health threat even in the face of lockdowns is that it spreads more easily (i.e., it is more infectious.)Part of the reason for the spread of SARS-CoV-2 is that it can be spread by individuals that are asymptomatic,13 a method of transmission that was never found in SARS.14 But we also wonder if we can find a biological basis for the increased infectiousness of SARS-CoV-2.In this module, we will place ourselves in the shoes of early SARS-CoV-2 researchers studying the new virus in January 2020. The virus’s genome (the 30,000 nucleotide sequence making up its DNA) was published on January 101516, and an annotation of this genome showing the position of the virus’s genes is shown in the figure below. Upon sequence comparison, SARS-CoV-2 was found to be related to several coronaviruses isolated from bats and distantly related to SARS-CoV, the viral strain that caused the 2003 SARS outbreak. In fact, SARS-CoV-2 has a sequence identity of around 96% with bat coronavirus RaTG13, providing further evidence that the virus originated in bats.An annotated genome of SARS-CoV-2. The Spike protein, found at the bottom of this image, is labeled “S” and begins at position 21,563. Accessed from GenBank: https://go.usa.gov/xfzMM.We now ask ourselves two questions. First, can we use the virus’s genome to determine the structure of its spike protein? Second, once we know the structure of the SARS-CoV-2 spike protein, how does its structure and function differ from the same protein in SARS-CoV? These two questions are central to understanding (and therefore fighting) this deadly virus.We will split our work on these two questions. If you are already familiar with protein structure prediction, then you may want to skip ahead to the second part of the module, in which we discuss differences between the two viruses.Continue to part 1: structure predictionJump to part 2: spike protein comparisons Green, A. (2020, February 18). Li Wenliang. The Lancet, 395(10225), P682. https://doi.org/10.1016/S0140-6736(20)30382-2 ↩ ↩2 Hung L. S. 2003. The SARS epidemic in Hong Kong: what lessons have we learned?. Journal of the Royal Society of Medicine, 96(8), 374–378. https://doi.org/10.1258/jrsm.96.8.374 ↩ Update 95 - SARS: Chronology of a serial killer. (2015, July 24). Retrieved August 17, 2020, from https://www.who.int/csr/don/2003_07_04/en/ ↩ CDC SARS Response Timeline. 2013, April 26. Retrieved August 17, 2020, from https://www.cdc.gov/about/history/sars/timeline.htm ↩ Li, W., Shi, Z., Yu, M., Ren, W., Smith, C., Epstein, J., Wang, H., Crameri, G., Hu, Z., Zhang, H., Zhang, J., McEachern, J., Field, H., Daszak, P., Eaton, B. T., Zhang, S., Wang, L. (2005). Bats Are Natural Reservoirs of SARS-Like Coronaviruses. Science, 310(5748), 676-679. doi:10.1126/science.1118391 ↩ Hu, B., Zeng, L., Yang, X., Ge, X., Zhang, W., Li, B., Xie, J., Shen, X., Zhang, Y., Wang, N., Luo, D., Zheng, X., Wang, M., Daszak, P., Wang, L., Cui, J., Shi, Z. 2017. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLOS Pathogens, 13(11). doi:10.1371/journal.ppat.1006698 ↩ Novel Coronavirus(2019-nCoV) Situation Report – 10. (2020, January 30). https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200130-sitrep-10-ncov.pdf?sfvrsn=d0b2e480_2 ↩ Murphy, F., Whitfield, S. 1975. ID#: 10270. Public Health Image Library, CDC. https://phil.cdc.gov/Details.aspx?pid=10270 ↩ Shang, J., Ye G., Shi, K., Wan, Y., Luo, C., Aihara, H., Geng, Q., Auerbach, A., Li, F. 2020. Structural basis of receptor recognition by SARS-CoV-2. Nature 581, 221-224. ↩ Li, F., Li, W., Farzan, M., Harrison, S. C. 2005. Structure of SARS Coronavirus Spike Receptor-Binding Domain Complexed with Receptor. Science 309, 1864-1868. ↩ Q&A on coronaviruses (COVID-19). (2020, April 17). https://www.who.int/emergencies/diseases/novel-coronavirus-2019/question-and-answers-hub/q-a-detail/q-a-coronaviruses ↩ Paules C.I., Marston H.D., Fauci A.S. 2020. Coronavirus Infections—More Than Just the Common Cold. JAMA. 323(8):707–708. doi:10.1001/jama.2020.0757 ↩ Tan, J., Liu, S., Zhuang, L., Chen, L., Dong, M., Zhang, J., & Xin, Y. 2020. Transmission and clinical characteristics of asymptomatic patients with SARS-CoV-2 infection. Future Virology, 10.2217/fvl-2020-0087. https://doi.org/10.2217/fvl-2020-0087 ↩ Severe Acute Respiratory Syndrome (SARS) Frequently Asked Questions. (n.d.) https://www.cdc.gov/sars/about/faq.html ↩ Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome. https://www.ncbi.nlm.nih.gov/nuccore/MN908947 ↩ Annotated Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome. https://go.usa.gov/xfzMM ↩ "
} ,
{
"title" : "Methylation Helps a Bacterium Adapt to Differing Concentrations",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_adaptation",
"date" : "",
"content" : "Bacterial tumbling frequencies remain constant despite background attractant concentrationsIn the previous lesson, we explored the signal transduction pathway by which E. coli can change its tumbling frequency in response to a change in the concentration of an attractant. But the reality of cellular environments is that the concentration of a substance in these environments can vary across several orders of magnitude. The cell therefore needs to detect not absolute concentrations of a substance but rather relative changes so that it can move in the direction of an attractant (or away from a repellent).STOP: Consider two bacterial cells, both of which are in well-mixed environments with fixed glucose concentrations. The first cell’s environment has a glucose concentration of x, and the second cell’s environment has a glucose concentration of 0.1x. Should the default tumbling frequency of the two cells be the same? What if we drop a sugar cube into both environments? Should the two cells respond in the same way or in different ways in response to the attractant?The ability of E. coli to react to relative changes in its environment is not present in our current model of chemotaxis. According to our current model, if a cell is in an environment with high background concentration of an attractant, then the cell will detect a signal and lower its tumbling frequency. If the concentration continues to increase, then it may not be able to lower this frequency any further.E. coli detects relative changes in its concentration via adaptation to the signal concentration. If the concentration of attractant remains constant for a period of time, then regardless of the absolute value of the concentration, the cell returns to the same background tumbling frequency. In other words, E. coli demonstrates robustness to the background concentration of attractant in maintaining its default tumbling behavior.In this lesson, we will investigate the biochemical mechanism that E. coli uses to achieve such a robust response to environments with different background concentrations. We will then further expand the model we built in the previous lesson to see if this model can replicate the bacterium’s adaptive response.Bacteria have a “memory” of past concentrations using methylationRecall from the previous lesson that in the absence of an attractant, CheW and CheA readily bind to an MCP, leading to greater autophosphorylation of CheA, which in turn phosphorylates CheY. The greater the concentration of phosphorylated CheY, the more frequently the bacterium tumbles.Signal transduction is achieved through phosphorylation, but E. coli maintains a “memory” of past environmental concentrations through a chemical process called methylation. In this reaction, a methyl group (-CH3) is added to an organic molecule; the removal of a methyl group is called demethylation.Every MCP receptor contains four methylation sites, meaning that between zero and four methyl groups can be added to the receptor. On the plasma membrane, many MCPs, CheW, and CheA molecules form an array structure. Methylation reduces the negative charge on the receptors, stabilizing the array and facilitating CheA autophosphorylation. The more sites that are methylated, the higher the autophosphorylation rate of CheA, which means that CheY has a higher phosphorylation rate, and tumbling frequency increases.We now have two different ways that tumbling frequency can be elevated. First, if the concentration of an attractant is low, then CheW and CheA freely form a complex with the MCP, and the phosphorylation cascade passes phosphoryl groups to CheY, which interacts with the flagella and keeps tumbling frequency high. Second, an increase in MCP methylation can also boost CheA autophosphorylation and lead to an increased tumbling frequency.Methylation of MCPs is achieved by an additional protein called CheR. When bound to MCPs, CheR methylates ligand-bound MCPs faster12, and so the rate of MCP methylation by CheR is higher if the MCP is bound to a ligand.3. Therefore, say that E. coli encounters an increase in attractant concentration. Then the lack of a phosphorylation cascade will mean that there is less phosphorylated CheY, and so the tumbling frequency will decrease. However, if the attractant concentration levels off, then the tumbling frequency will flatten, while CheR starts methylating the MCP. Over time, the rising methylation will increase CheA autophosphorylation, bringing back the phosphorylation cascade and raising tumbling frequency back to default levels.Just as the phosphorylation of CheY can be reversed, MCP methylation can be undone as well to prevent methylation from being permanent. In particular, an enzyme called CheB, which like CheY is phosphorylated by CheA, demethylates MCPs (as well as autodephosphorylates). The rate of an MCP’s demethylation is dependent on the extent to which the MCP is methylated. In other words, the rate of MCP methylation is higher when the MCP is in a low methylation state, and the rate of demethylation is faster when the MCP is in a high methylation state.3The figure below adds CheR and CheB to provide a complete picture of the core pathways influencing chemotaxis. To model these pathways, we will need to add quite a few molecules and reactions to our current model.The chemotaxis signal-transduction pathway with methylation included. CheA phosphorylates CheB, which methylates MCPs while CheR demethylates MCPs. Blue lines denote phosphorylation, grey lines denote dephosphorylation, and the green arrow denotes methylation. Image modified from Parkinson Lab’s illustrations.Combinatorial explosion and the need for rule-based modelingOur goal is to expand the BioNetGen model that we built in the previous lesson, and then see if this model can replicate the adaptation behavior of E. coli in the presence of a changing attractant concentration. Before incorporating the adaptation mechanisms in our BNG model, we will first describe the reactions that BioNetGen will need.We begin with considering the MCP complexes. In the phosphorylation tutorial, we identified two components relevant for reactions involving MCPs: a ligand-binding component l and a phosphorylation component Phos. The adaptation mechanism introduces two additional reactions: methylation of the MCP by CheR, and demethylation of the MCP by CheB.We also need to include binding and dissociation reactions between the MCP and CheR because under normal conditions, most CheR are bound to MCP complexes.4 We will therefore introduce two additional components to the MCP molecules in addition to their phosphorylation components: r (denoting CheR-binding) and Meth (denoting methylation states). In our simulation, we will use three methylation levels (low, medium, and high) because these three states are most involved in the chemotaxis response to attractants.5Imagine for a moment that we were attempting to specify every reaction that could take place in our model. To specify an MCP, we would need to tell the program whether it is bound to a ligand (two possible states), whether it is bound to CheR (two possible states), whether it is phosphorylated (two possible states), and which methylation state it is in (three possible states). Therefore, a given MCP has 2 · 2 · 2 · 3 = 24 total states.Say that we are simulating the simple reaction of a ligand binding to an MCP, which we originally wrote as T + L → TL. We now need this reaction to include 12 of the 24 states, which are those corresponding to the MCP being unbound to the ligand. Our simple reaction would become 12 different reactions, one for each possible unbound state of the complex molecule T. And if the situation were just a little more complex, with the ligand molecule L having n possible states, then we would have 12n reactions. Imagine trying to debug a model in which we had accidentally incorporated a typo when transcribing just one of these reactions!In other words, as our model grows, with multiple different states for each molecule involved in each reaction, the number of reactions we need to represent the system grows very fast; this phenomenon is called combinatorial explosion. Our model of chemotaxis is ultimately relatively straightforward, but combinatorial explosion means that building realistic models of biochemical systems at scale without a simplifying language can be daunting if not impossible.A major benefit of using a rule-based modeling language provided by BioNetGen is that it circumvents combinatorial explosion by consolidating many reactions into a single rule. For example, when modeling ligand-MCP binding, we can summarize the 12 different reactions with the rule “a free ligand molecule binds to an MCP that is not bound to a ligand molecule.” In the BioNetGen language, this rule is represented by the same one-line expression as it was in the previous lesson:LigandReceptor: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_disWhy is one rule enough? Recall from our discussion of the Gillespie algorithm that the wait time before the next reaction to be sampled depends only on the rate of all relevant reactions in the system. In this particular case, the rate of ligand-MCP binding depends on the total concentration of free ligands and unbound MCPs, but it does not depend on the state that an MCP is in.In the following tutorial, we will expand our BioNetGen model from the previous tutorial into one that can incorporate CheR binding as well as MCP methylation while avoiding combinatorial explosion. We will then examine whether this model allows us to learn anything about how a bacterium can adapt to changes in the relative concentration of attractant.Visit tutorialBacterial tumbling is resilient to large sudden changes in ligand concentrationIn the figures below, we show plots of the concentration of each molecule of interest in our system for a few different cases. In each case, we suddenly change the concentration of the attractant ligand (l0) and examine how this affects the concentration of phosphorylated CheY (the molecule whose phosphorylation is directly correlated with increased tumbling frequency). The attractant concentration will then level off; because the relative concentration is not changing, will our model reflect the hypothesis that E. coli can return to approximately the same steady-state concentration of phosphorylated CheY regardless of the concentration of the ligand?Below, we show simulation results for some different concentrations of ligand molecules added at the beginning of the simulation. First we add a relatively small amount, setting l0 equal to 10,000. The system returns so quickly to an equilibrium in phosphorylated CheY that it is difficult to think that the attractant has had any effect on tumbling frequency.Note: Time is shown in seconds on the x-axis in the following figures.If instead l0 is equal to 100,000, we obtain the figure below. After a drop in the concentration of phosphorylated CheY, the system returns to equilibrium after a few minutes.When we increase l0 by another factor of ten to 1 million, the initial drop is more pronounced, but the system is able to just as quickly return to equilibrium. Note how much higher the concentration of methylated receptors are in this figure compared to the previous figure; however, there are still a significant concentration of receptors with low methylation, indicating that the system may be able to handle a yet bigger jolt.When we set l0 equal to 10 million, we give the system this bigger jolt. Once again, the model is resilient to this change in the concentration of the ligand after a few minutes.Finally, with l0 equal to 100 million, we see what we might expect: the steepest drop in phosphorylated CheY yet, but a system that is able to return to equilibrium.Our model therefore has provided compelling evidence that the E. coli chemotaxis system is very robust to changes in its environment. The simulated bacterium can make a very rapid change in response to a sudden change in its environment, but even if this change is significant, the system will return to its default state. This robustness in our simulation has been observed in real bacteria67, as well as replicated by other computational simulations8.Aren’t bacteria magnificent?However, our work is not done. We have simulated how a bacterium can adapt to a single sudden change in its environment, but life is about responding to changes all the time. So in the next lesson, we will further examine how our simulated E. coli responds in an environment in which the ligand concentration is changing constantly.Next lessonAdditional resourcesSome resources/reads if you are interested in the chemotaxis biology: Amazing introduction to chemotaxis: Parkinson Lab website. A good overview: by Webre et al. published in 2003. Available online Details on chemotaxis pathway and MCPs: review article by Baker et al. published in 2005 Available online. Details on MCPs: more recent review by Parkinson et al. published in 2015. Available online. Modeling robustness and integral feedback: lecture note by Berg in 2008. Available online. Amin DN, Hazelbauer GL. 2010. Chemoreceptors in signaling complexes: shifted conformation and asymmetric coupling. Available online ↩ Terwilliger TC, Wang JY, Koshland DE. 1986. Kinetics of Receptor Modification - the multiply methlated aspartate receptors involved in bacterial chemotaxis. The Journal of Biolobical Chemistry. Available online ↩ Spiro PA, Parkinson JS, and Othmer H. 1997. A model of excitation and adaptation in bacterial chemotaxis. Biochemistry 94:7263-7268. Available online. ↩ ↩2 Lupas A., and Stock J. 1989. Phosphorylation of an N-terminal regulatory domain activates the CheB methylesterase in bacterial chemotaxis. J Bio Chem 264(29):17337-42. Available online ↩ Boyd A., and Simon MI. 1980. Multiple electrophoretic forms of methyl-accepting chemotaxis proteins generated by stimulus-elicited methylation in Escherichia coli. Journal of Bacteriology 143(2):809-815. Available online ↩ Shimizu TS, Delalez N, Pichler K, and Berg HC. 2005. Monitoring bacterial chemotaxis by using bioluminescence resonance energy transfer: absence of feedback from the flagellar motors. PNAS. Available online ↩ Krembel A., Colin R., Sourijik V. 2015. Importance of multiple methylation sites in Escherichia coli chemotaxis. Available online ↩ Bray D, Bourret RB, Simon MI. 1993. Computer simulation of phosphorylation cascade controlling bacterial chemotaxis. Molecular Biology of the Cell. Available online ↩ "
} ,
{
"title" : "A Biochemically Accurate Model of Bacterial Chemotaxis",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_biochemistry",
"date" : "",
"content" : "Transducing a signal to a cell’s interiorIn the previous two lessons, we discussed how a cell recognizes an extracellular signal when receptor proteins on the cell’s surface bond to ligands, and how to model the reversible ligand-receptor reaction using stochastic simulation via the Gillespie algorithm. We now turn to the question of how the cell conveys the extracellular signal it detects via ligand-receptor bonding to the cell’s interior in order to produce an action via the process of signal transduction. For example, if E. coli senses an increase in the concentration of glucose, meaning that more ligand-receptor bonding is taking place at the receptor that recognizes glucose, how is E. coli able to change its behavior as a result of this increased bonding?The engine of signal transduction is a series of phosphorylation events. Phosphorylation is a chemical reaction that attaches a phosphoryl group (PO3-) to an organic molecule. Phosphoryl modifications serve as an information exchange of sorts because they activate or deactivate certain enzymes.A phosphoryl group usually comes from one of two sources. First, the phosphoryl can be broken off of an adenosine triphosphate (ATP) molecule, the “energy currency” of the cell, producing adenosine diphosphate (ADP). Second, the phosphoryl can be exchanged from a phosphorylated molecule that has had its phosphoryl group removed in a dephosphorylation reaction.In the case of chemotaxis, a sequence of phosphorylation events inside E. coli called a phosphorylation cascade serves to transmit information within the cell about the amount of ligand bonding being detected on the exterior of the cell. In this lesson, we discuss the details of how this cascade of chemical reactions leads to a change in bacterial movement.Chemotaxis pathwayA high-level view of the transduction pathway for chemotaxis is shown in the figure below. The cell membrane contains receptors called methyl-accepting chemotaxis proteins (MCPs). The MCPs, which bridge the cellular membrane, bond to ligand stimuli in the cell exterior, and then also bond to other proteins on the inside of the cell. The pathway includes a number of additional proteins, which all start with the prefix Che (short for “chemotaxis”). In what follows, we explain the reactions detailed in this figure.A summary of the chemotaxis transduction pathway. A ligand binding signal is propagated through CheA and CheY phosphorylation, which leads to a response of clockwise flagellar rotation. The blue curved arrow denotes phosphorylation, the grey curved arrow denotes dephosphorylation, and the blue dashed arrow denotes a chemical interaction. Figure is a simplification of Parkinson Lab’s illustrations.On the interior of the cellular membrane, MCPs form complexes with two proteins called CheW and CheA; in the absence of MCP-ligand bonding, this complex is more stable. When bound to the complex, the CheA molecule autophosphorylates, meaning that it adds a phosphoryl group taken from ATP to itself — a concept that might seem mystical if you have not already followed our discussion of autoregulation in the previous module.If CheA is phosphorylated, then it can pass on the phosphoryl group to a molecule called CheY, which interacts with the flagellum in the following way. Each flagellum has a protein complex called the flagellar motor switch that is responsible for controlling the direction of flagellar rotation. The interaction of this protein complex with phosphorylated CheY induces the change of flagellar rotation from counter-clockwise to clockwise. As we discussed earlier in this module, this change in flagellar rotation causes the bacterium to tumble, which in the absence of an increase in attractant occurs every 1 to 1.5 seconds.Yet when a ligand binds to the MCP, the MCP undergoes conformation changes, which reduce the stability of the complex with CheW and CheA. As a result, CheA is less readily able to autophosphorylate, which means that it does not phosphorylate CheY, and so because there is less phosphorylated CheY, the tumbling frequency decreases.In other words, the exchange of phosphoryl groups means that a ligand exterior to the cell can indirectly serve as an inhibitor for phosphorylated CheA as well as phosphorylated CheY. Thus, ligand binding causes fewer flagellar interactions and in turn less tumbling of the bacterium.It is critical that as part of this process, a high concentration of phosphorylated CheY can be decreased if a ligand is detected; otherwise, the cell will not be able to change its tumbling frequency. To this end, the cell needs a complementary reaction that reverses the phosphorylation of CheY; this dephosphorylation reaction is catalyzed by an enzyme called CheZ.Adding phosphorylation events to our model of chemotaxisWe would like to simulate the reactions driving chemotaxis signal transduction and see what happens if the bacterium “senses an attractant”, meaning that the attractant ligand’s concentration increases and leads to more receptor-ligand binding. To do so, we will build on the particle-free model for ligand-receptor dynamics that we introduced in the previous lesson.This model will be more complicated than any we have introduced thus far in the course. We will need to account for both bound and unbound MCP molecules, as well as phosphorylated and unphosphorylated CheA and CheY enzymes. We will also need to model phosphorylation reactions of CheA that depend on the current concentrations of bound and unbound MCP molecules.We introduced BioNetGen in a previous tutorial when implementing the Gillespie algorithm for our computation of the equilibrium of bound ligand-receptor complexes. However, BioNetGen is useful not only for running particle-free simulations, but also because it implements its own language for rule-based modeling.Say that we were to specify all reactions using the style of modeling reactions used in previous modules. We would need one particle type to represent bound MCP molecules, another particle type to represent ligands, and a third to represent bound complexes. A bound complex molecule binds with CheA and CheW and can be either phosphorylated or unphosphorylated, necessitating two different molecule types. In turn, CheY can be phosphorylated or unphosphorylated as well, requiring two more particles.Instead, the BioNetGen language will allow us to conceptualize this system much more concisely using rules that can apply to particles that are in a variety of states. First, we will make the simplifying assumption about our system that the receptor includes CheA and CheW, so that we do not need to represent these as separate particles. The BioNetGen representation of the four particles in our system is shown below. The notation Phos~U~P indicates that a given molecule type can be either phosphorylated or unphosphorylated, so that we do not need multiple different particles to represent the molecule.L(t) #ligand moleculeT(l,Phos~U~P) #receptor complexCheY(Phos~U~P)CheZ()The conciseness of BioNetGen’s molecule representation allows us to represent our reactions concisely as well. First, we reproduce the binding and dissociation reactions from the ligand-receptor binding tutorial; please refer to this tutorial for an explanation of the specific notation used in the rest of this section.LigandReceptor: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_disSecond, we represent the phosphorylation of the MCP complex. Recall that the phosphorylation of CheA can happen at different rates depending on whether the MCP is bound or not, and so we will need two different reactions to represent these different rates. We will assume that the phosphorylation of the MCP occurs at one fifth the rate when it is bound.FreeTP: T(l,Phos~U) -> T(l,Phos~P) k_T_phosBoundTP: L(t!1).T(l!1,Phos~U) -> L(t!1).T(l!1,Phos~P) k_T_phos*0.2Finally, we represent the phosphorylation and dephosphorylation of CheY. The former requires a phosphorylated MCP receptor, while the latter is done with the help of a CheZ molecule.YP: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phosYDep: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephosNow that we have written these reactions representing the chemotaxis signal transduction pathway, we would like to see what happens when we change the concentrations of the ligand. Ideally, the bacterium should be able to distinguish between different ligand concentrations. That is, the higher the concentration of an attractant ligand, the lower the concentration of phosphorylated CheY, and the lower the tumbling frequency of the bacterium.But does higher attractant concentration in our model really lead to a lower concentration of CheY? Let’s find out by incorporating the phosphorylation pathway into our ligand-receptor model in the following BioNetGen tutorial.Visit tutorialTumbling frequency and changing ligand concentrationsThe following figure shows the concentrations of phosphorylated CheA and CheY in a system at equilibrium in the absence of ligand. As we might expect, we see the concentrations of these particles remain at steady-state (with some noise in the concentrations), and we can presume that the cell stays at its background tumbling frequency.The sudden addition of 5,000 attractant ligand molecules increases the concentration of bound receptors, therefore leading to less CheA autophosphorylation, and less phosphorylated CheY.If we instead add 100,000 attractant molecules, then we see an even more drastic decrease in phosphorylated CheA and CheY.In other words, the BNG simulation is confirming that an increase in attractant reduces the concentration of phosphorylated CheY, which therefore lowers the tumbling frequency.So … what’s the big deal?You may not be surprised that we have been able to build a model simulating the system that E. coli uses to detect extracellular concentration of ligand and change its behavior accordingly. After all, the biochemistry presented here may be elegant, but it is also simple.But what we have shown in this lesson is just half of the story. In the next lesson, we will see that the biochemical realities of chemotaxis are even more complicated, and for good reason — this complexity will allow E. coli to react with surprising intelligence to a dynamic world.Next lesson"
} ,
{
"title" : "Conclusion: The Beauty of *E. coli*'s Robust Randomized Exploration Algorithm",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_conclusion",
"date" : "",
"content" : "Two randomized exploration strategiesIn this final lesson of the module, we will use what we have learned about chemotaxis to build a random walk model emulating the behavior of E. coli within a background containing a variable concentration of attractant. The bacterium will reorient itself randomly, but it will be able to change its tumbling frequency based on the relative concentration of attractant at its current location.We will then compare this realistic algorithm of bacterial movement against a baseline simplistic algorithm in which the bacterium walks for a fixed distance and then reorients itself in a random direction. Does the more realistic exploration algorithm allow the bacterium to find attractant faster?We will represent a bacterium as a point in two-dimensional space. Units in our space will be measured in µm, so that moving from (0, 0) to (0, 20) is 20µm, a distance that we know from the introduction can be covered by the bacterium in 1 second during an uninterrupted run. The bacterium will start at the origin (0, 0), which we will establish to have a ligand concentration of 100 molecules/µm3.At any point (x, y), there is some concentration L(x, y) of ligand; furthermore, we simulate an attractant gradient by ensuring that there is a point (called the goal) at which L(x, y) is maximized, with the concentration of attractant diminishing as the distance from this point increases. The goal contains a maximum concentration of 108 molecules/µm3, and we will place the goal at (1500, 1500), so that the bacterium must travel a significant distance to locate the attractant.The concentration of ligands L(x, y) is maximized at the goal and decreases exponentially the farther we travel from it. To represent this, we set L(x, y) = 100 · 106 · (1-d/D), where d is the distance from (x, y) to the goal, and D is the distance from the origin to the goal, which in this case is 1500√2 ≈ 2121 µm.STOP: How can we quantify how well a bacterium has done at finding the attractant?For each of our two strategies, we will simulate many random walks of a given bacterium throughout this space, where each each simulation lasts some fixed time. (The total time needed by our simulation should be large enough to allow the bacterium to have enough time to reach the goal.) To compare the two strategies, we will then measure how far on average a bacterium with each strategy is from the goal at the end of the simulation.Now that we have established how we will model a bacterum searching for attractant, we will specify how we will implement each of the two specific strategies that we wish to model.Strategy 1: Standard random walkTo model our “unintelligent” random walk strategy, we first select a random direction of movement along with a duration of our tumble. The degree of reorientation follows a uniform distribution from 0° to 360°. The duration of each tumble follows an exponential distribution with mean 0.1s1. As the result of a tumble, the cell only changes its orientation, not its position.We then select a random duration to run and let the bacterium run in that direction for the specified amount of time. The duration of each run follows an exponential distribution with mean equal to the experimentally verified value of 1 second.We then iterate these two steps of tumbling and running until the total time used is equal to the time devoted to the simulation.In the following tutorial, we simulate this naive strategy using a Jupyter notebook that will also help us visualize the results of the simulation.Visit standard random walk tutorialStrategy 2: Chemotactic random walkIn our second strategy, we attempt to mimic the real response of E. coli to its environment based on what we have learned about chemotaxis throughout this module. The bacterium will still follow a run and tumble model, but the duration of its runs (which is a function of its tumbling frequency) depends on the relative change in attractant concentration that it detects.To ensure a mathematically controlled comparison, we will use the same approach for determining the duration of a tumble and the resulting direction of a run as in strategy 1.This second strategy will therefore differ only in how it chooses the length of a run. Let t0 denote the mean background run duration, which in the first strategy was equal to 1 second, and let Δ[L] denote the difference between the ligand concentration L(x, y) at the cell’s current point and the ligand concentration at the cell’s previous point. We would like to choose a simple formula for the expected run duration like t0 * (1 + 10 · Δ[L]).However, there are two issues with using this formula. First, if Δ[L] is less than -0.1, then the run duration could be negative. Second, if Δ[L] is large, then the bacterium will run for so long that it may simply bypass the goal.To prevent the run duration from being negative, we will first take the maximum of t0 * (1 + 10 · Δ[L]) and some small positive number (we will use 0.000001). We will then take the minimum of the resulting value and 4 · t0 to prevent the run length from being too large. We will then use the resulting value as the mean of an exponential distribution to determine run duration.As with the first strategy, our simulated cell will alternate between tumbling and running until the total time devoted to the simulation has been consumed. In the following tutorial, we will adapt the Jupyter notebook that we built in the previous tutorial to simulate this second strategy.Visit chemotactic walk tutorialComparing the effectiveness of our two random walk strategiesThe following figure visualizes the trajectories of three cells using strategy 1 (left) and strategy 2 (right). After 500 seconds, cells using strategy 1 have traveled away from the origin, and some of them are found in locations with higher concentrations. The cells using strategy 2, however, quickly hone in on the goal and remain near it.Sample trajectories for the two strategies. The standard random walk strategy is shown on the left, and the chemotactic random walk is shown on the right. Regions that are more heavily colored red correspond to higher concentrations of ligand, with a goal having maximum concentration at the point (1500, 1500), which is highlighted using a blue square. A single cell’s walk is colored from darker to lighter colors across the time frame of the trajectory.Of course, we should be wary of our small sample size. To confirm that what we observed in these trajectories is true on average, we will compare the two strategies for many simulations. The following figure plots the cell’s average distance to the goal over 500 simulations for both strategies.Average distance to the goal plotted over time for 500 cellular simulations following each of the two strategies; the standard random walk is shown in red, and the chemotactic random walk is shown in blue. The shaded area for each strategy represents one standard deviation from the average.Using strategy 1, cells have some chance of reaching the goal because they tend to spread out over time, but there is no aspect of the strategy that would keep the cells at the goal, and so the average distance to the goal does not decrease. With strategy 2, the cells are able to get closer to the goal and remain there due to the small standard deviation in their distance to the goal.Strategy 2 corresponds to a very slight change in strategy 1 in which we allow the cell to run for a greater distance if it senses an increase in the attractant concentration. But the direction of travel is still random. So why would this strategy be so much better than a pure random walk?The attractant detection serves as a sort of “rubber band”. If the bacterium is traveling down an attractant gradient (i.e., away from an attractant), then it is not allowed to travel very far in a single step. If an increase of attractant is detected, then the cell can travel farther before tumbling. On average, then, this effect helps to pull the bacterium in the direction of increasing attractant, even though each of its steps is taken in a random direction.We have shown that a very slight change to a simple randomized algorithm can produce an elegant approach for exploring an unknown environment. But we left one more question unanswered. Why is it that a default tumbling frequency of one tumble per second appears to be evolutionarily stable across a wide range of bacteria?To address this question, we will make changes to t0, the default time for a run step, and see how this affects the ability of a simulated bacterium following the chemotactic strategy to locate the goal. You may like to adjust the value of t0 in the chemotactic walk tutorial before continuing on.Visit tutorialWhy is bacterial background tumbling frequency constant across species?The following figures show three trajectories for a few different values of t0 and a simulation that lasts for 800 seconds. First, we set t0 equal to 0.2 seconds and see that the bacteria are not able to walk far enough in a single step. That is, the “rubber band” effect is too rigid.Three sample trajectories of a simulated cell following the chemotactic random walk strategy with a tumble every 0.2 seconds on average.If we increase t0 to 5.0 seconds, then the rubber band becomes too flexible, meaning that cells can run past the goal without being able to put on the brakes by tumbling.Three sample trajectories of a simulated cell following the chemotactic random walk strategy with a tumble every 5 seconds on average.When we set t0 equal to 1.0, we see a “Goldilocks” effect in which the rubber band effect is just right. The simulated bacterium can run for long enough at a time to head quickly toward the goal, and it tumbles frequently enough to keep it there.Three sample trajectories of a simulated cell following the chemotactic random walk strategy with a tumble on average of once every second.To make this analysis more concrete, the figure below shows a plot of average distance to the goal over time for 500 simulated cells following the chemotactic strategy for a variety of choices of t0.Average distance to the goal over time for 500 cells. Each colored line indicates the average distance to the goal over time a different value of t0; the shaded area represents one standard deviation.This figure illustrates the a tradeoff between reaching the target quickly and being able to stay there. For large values of t0 (10.0, 5.0, 2.0), distances to the goal decrease very quickly at the beginning of the simulation, but the cells don’t stay there effectively. For small values of t0 (0.1, 0.2, 0.5), the cells fail to move to the ligand efficiently. When t0 is equal to 0.5 seconds, the cell is able to remain around the goal, but it takes about 400 seconds longer to reach the goal than when t0 is equal to 1.0 seconds.Recall the video of E. coli moving towards the sugar crystal that we showed at the beginning of this module, which we reproduce below. The video shows that the behavior of real E. coli is reflected by our simulated bacteria. Bacteria generally move towards the crystal and then remain close to it; some bacteria run by the crystal, but they then turn around to move toward the crystal again.Bacteria are even smarter than we thoughtIf you closely examine the video above, then you may be curious about the way that bacteria turn around and head back toward the attractant. When they reorient, their behavior appears more intelligent than simply walking in a random direction. The reason for this behavior of the bacteria is that like most things in biology, the reality turns out to be more complex than we might at first imagine.Specifically, researchers first showed that the direction of reorientation rather follows a normal distribution with mean of 68° and standard deviation of 36°2. That is, the bacterium typically does not tend to make as drastic of a change to its orientation as it would in a pure random walk, which would on average have a change in orientation of 90°.Yet recent research has shown that the direction of the bacterium’s reorientation depends on whether the cell is traveling in the correct direction.3 If moving up an attractant gradient, then the cell makes much smaller changes in its reorientation angle. This allows the cell to retain its orientation if it is moving in the correct direction while also to turn around quickly if it starts heading in the wrong direction. We can even see this behavior in the video above, in which bacteria traveling toward the attractant make only very slight changes in their direction of travel, but reorient themselves more drastically if they overshoot the target.In this module, we have witnessed the emergence of an apparently intelligent algorithm from a simple collection of reactions that drive an organism’s biochemistry. What look like decisions made by the bacterium are in fact robust actions taken as the direct result of chemical reactions.Bacterial chemotaxis is probably the best understood biological system from the perspective of understanding how low-level chemical actions cause emergent behavior. But that is not to say that it is the only such system. As we pointed out in the prologue, this thread connecting chemical reactions to the behavior that we experience as life is for the most part still invisible. Although a simple system like bacterial chemotaxis can be understood concretely, a rationale for how microscopic processes drive macroscopic action is a mystery that may be unresolved for a very long time.Visit exercises Saragosti J., Siberzan P., Buguin A. 2012. Modeling E. coli tumbles by rotational diffusion. Implications for chemotaxis. PLoS One 7(4):e35412. available online. ↩ Berg HC, Brown DA. 1972. Chemotaxis in Escherichia coli analysed by three-dimensional tracking. Nature. Available online ↩ Saragosti J, Calvez V, Bournaveas, N, Perthame B, Buguin A, Silberzan P. 2011. Directional persistence of chemotactic bacteria in a traveling concentration wave. PNAS. Available online ↩ "
} ,
{
"title" : "Exercises",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_exercise",
"date" : "",
"content" : "How does E. coli respond to repellents?Just as E. coli has receptors that bond to attractant ligands, it has other receptors that can bond to repellent ligands.Exercise: Based on what we have learned in this module about how E. coli and other bacteria act in the presence of an attractant, what do you think that the chemotaxis response is in the presence of a repellent? How do you think that the bacterium adjusts to relative changes of the repellent?In the phosphorylation tutorial, we defined the rate constant for free CheA autophosphorylation k_T_phos, and specified that when the receptor complex is bound to an attractant molecule, the autophosphorylation rate constant decreases to 0.2 · k_T_phos. To model a receptor complex bound to a repellent molecule, we will need to change the autophosphorylation rate so that it is greater than k_T_phos.Exercise: Adapt the BioNetGen model so that the autophosphorylation rate constant is 5 · k_T_phos. then run your simulation for 3 seconds with L0 = 5000 and L0 = 1e5 repellent ligand molecules added at the beginning of the simulation. How does the concentration of phosphorylated CheY change? What do you conclude?What if E. coli has multiple attractant sources?Not only can E. coli sense both repellents and attractants, but it can detect more than one attractant gradient at the same time. This function has a clear evolutionary purpose in a bacterial environment of multiple sparsely populated food sources. In this section, we will explore whether the chemotaxis mechanism allows cells to navigate through heterogeneous nutrient distributions.Exercise: Modify our model from the adaptation tutorial to reflect two types of receptor, each specific to its own ligand (call them A and B). Assume that we have 3500 receptor molecules of each type. (Hint: you will not need to have additional molecules in addition to L and T. Instead, specify additional states for the two molecules that we already have; for example L(t,Lig~A) should only bind with T(l,Lig~A). Don’t forget to update seed species as well!)In the previous exercise, the cell adapts to the presence of two different attractants at the same time. We now consider what will happen if we only add B molecules of B once the cell has already adapted to A molecules.Exercise: Change your model by assuming that after the cell adapts to 1e6 molecules of A, 1e6 molecules of B are added. Observe the concentration of phosphorylated CheY. Is the cell able to respond to B after adapting to the concentration of ligand A? Why do you think that the change in CheY phosphorylation different from the scenario in which we release the two different ligands concurrently? (The hint for the previous exercise also applies to this exercise.)In the chemotactic walk tutorial, we used a concentration gradient that grew exponentially toward a single goal. Specifically, if L(x, y) was the concentration of ligand at (x, y), we set L(x,y) = 100 · 106 · (1-d/D), where d is the distance from (x, y) to the goal, and D is the distance from the origin to the goal (we used a goal of (1500, 1500)).To simulate an environment with more than one food source, we will include another goal at (-1500, 1500). The new ligand concentration formula will be L(x, y) = 100 · 108 · (1-d1/D1) + 100 · 108 · (1-d2/D2), where d1 is the distance from (x, y) to the goal at (1500, 1500), d2 is the distance from (x, y) to the goal at (-1500, 1500), and D1 and D2 are the distances from the origin to the two respective goals.Exercise: Change the chemotactic walk simulation so that it includes the two goals, and visualize the trajectories of several cells using a background tumbling frequency of once every second. Are the cells able to find one of the goals? How long does it take them?Exercise: Vary the tumbling frequency according to the parameters given in the chemotactic walk tutorial to see how tumbling frequency influences the average distance of a cell to the closer of the two goals. As in the tutorial, run your simulation for 500 cells with tumbling frequencies time_exp = [0.2, 0.5, 1.0, 2.0, 5.0].Changing the E. coli choice of directionIn the conclusion, we mentioned that when E. coli tumbles, the degree of reorientation is not uniformly random from 0° to 360°. Rather, research has shown that it follows a normal distribution with mean of 68° (1.19 radians) and standard deviation of 36° (0.63 radians).Exercise: Modify your model from the chemotactic walk tutorial to change the random uniform sampling to this “smarter” sampling. Compare the chemotactic walk strategy and this smarter strategy by calculating the mean and standard deviation of each cell’s distance to the goal for 500 simulated cells with the collection of tumbling frequencies time_exp = [0.2, 0.5, 1.0, 2.0, 5.0]. Do these simulated cells do a better job of finding the goal?More recent research suggests that when the bacterium is moving up an attractant gradient, the degree of reorientation may be even smaller1. Do you think that such a reorientation strategy would improve a cell’s chemotaxis response?Exercise: Modify your model from the previous exercise so that if the cell has just made a move of increasing ligand concentration, then its mean reorientation angle is 0.1 radians smaller to change the random uniform sampling to this “smarter” sampling. Calculate the mean and standard deviation of each cell’s distance to the goal for 500 cells with time_exp = [0.2, 0.5, 1.0, 2.0, 5.0]. Do the cells find the goal faster?Can’t get enough BioNetGen?As we have seen in this module, BioNetGen is very good at simulating systems that involve a large number of species and particles but can be summarized with a small set of rules.Polymerization reactions offer another good example of such a system. Polymerization is the process by which monomer molecules combine into chains called polymers. Biological polymers are everywhere, from DNA (formed of monomer nucleotides) to proteins (formed of monomer amino acids) to lipids (formed of monomer fatty acids). For a nonbiological example, polyvinyl chloride (which lends its name to “PVC pipe”) is a polymer made up of many vinyl molymers.We would like to simulate the polymerization of copies of a monomer A to form a polymer AAAAAA…, where the length of the polymer is allowed to vary. If we simulate this process, we are curious what the distribution of the polymer lengths will be.We will write our polymer reaction as Am + An -> Am+n, where Am denotes a polymer consisting of m copies of A. Using classical reaction rules, this would require an infinite number of reactions; will BioNetGen come to our rescue?There are two sites on the monomer A that are involved in a polymerization reaction: the “head” and the “tail”. For two monomers to bind, we need the head on one monomer and the tail on another to both be free. The following BioNetGen model is taken from the BioNetGen tutorials.Create a new BioNetGen file and save it as polymers.bngl. We will have only one molecule type: A(h,t); the h and t labels indicate the “head” and “tail” binding sites, respectively. To model polymerization, we will need to represent four reaction rules: initializing the series of polymerization reactions: two unbound copies of A form an initial dimer, or a polymer with just two monomers; adding an unbound A to the “tail” of an existing polymer; adding an existing polymer to the “tail” of an unbound A; and adding an existing polymer to the “tail” of another polymer.To select any species that is bound at a component, we use the notation !+; for example, A(h!+,t) will select any A whose “head” is bound, whether it belongs to a chain of one or one million monomers.We will assume that the forward and reverse rates for each reaction occur at the same rate. For simplicity, we will set all forward and reverse reaction rates to be equal to 0.01.We will initialize our simulation with 1000 unbound A monomers and observe the formation of polymer chains of a few different lengths (1, 2, 3, 5, 10, and 20). To do so, we can use an “observable” A == n to denote that a polymer contains n copies of A. We need to use Species instead of Molecules to select polymer patterns.begin seed species A(h,t) 1000end seed speciesbegin observables Species A1 A==1 Species A2 A==2 Species A3 A==3 Species A5 A==5 Species A10 A==10 Species A20 A==20 Species ALong A>=30end observablesFor this model, the infinite number of possible interactions will slow down the Gillespie algorithm. For that reason, we will use an alternative to the Gillespie algorithm called network-free simulation, which tracks individual particles.After building the model, we can run our simulation with the following command (note that we do not need the generate_network() command):simulate({method=>"nf", t_end=>100, n_steps=>1000})Exercise:Run the simulation. What happens to the concentration of shorter polymers? What about the longer polymers? You may like to play around with the lengths of the polymers that we are interested in.Exercise: What happens if we also tweak the reaction rates so that bonding is a little more likely than dissociation? What if dissociation is more likely? Does this reflect what you would guess?Next module Saragosti J, Calvez V, Bournaveas, N, Perthame B, Buguin A, Silberzan P. 2011. Directional persistence of chemotactic bacteria in a traveling concentration wave. PNAS. Available online ↩ "
} ,
{
"title" : "Modeling a Bacterium's Response to an Attractant Gradient",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_gradient",
"date" : "",
"content" : "Traveling up an attractant gradientIn the previous lesson, we saw that E. coli is able to adapt its default tumbling frequency to the current background concentration of attractant. To model this behavior, we used the Gillespie algorithm and the rule-based language of BioNetGen to simulate an instantaneous increase in concentration from one stable concentration level to another.Yet imagine a glucose cube in an aqueous solution. As the cube dissolves, a gradient will form, with a decreasing glucose concentration that radiates outward from the cube. How will the tumbling frequency of E. coli change if the bacterium is moving up a gradient of increasing attractant concentration? Will the tumbling frequency decrease continuously as well, or will the methylation pathways mentioned in the previous lesson cause more complicated behavior?Furthermore, once the cell reaches a region of high attractant concentration, will its default tumbling frequency stabilize to the same steady-state? And how much does this steady-state tumbling frequency change as we alter the “steepness” of the attractant gradient (i.e., how quickly the attractant concentration increases)?In the following tutorial, we will modify our model from the previous lesson by increasing the concentration of the attractant ligand at an exponential rate and seeing how the concentration of phosphorylated CheY changes. Moreover, we will examine how this concentration changes as we change the gradient’s “steepness”, or the rate at which attractant ligand is increasing.Visit tutorialSteady-state tumbling frequency is robust when traveling up an attractant gradientRecall that we used the expression [L] to denote the concentration of ligand L and l0 to denote the initial concentration of the ligand. If the concentration of the ligand is growing exponentially, then [L] = l0 · ek · t, where t is the time since the start and k is a parameter dictating exponential growth; the higher the value of k, the faster the growth in the ligand concentration.For example, the following figure depicts the concentration of phosphorylated CheY (shown in blue) over time when l0 = 1000 and k = 0.1. The concentration of phosphorylated CheY, and therefore the tumbling frequency, still decreases sharply as the ligand concentration increases, but after all ligands become bound to receptors (shown by the plateau in the red curve), the methylation of receptors causes the concentration of phosphorylated CheY to return to its equilibrium. In other words, for these values of l0 and k, the outcome is similar to when we provided an instantaneous increase in ligand, although the cell takes longer to reach a minimum concentration of phosphorylated CheY because the attractant concentration is increasing gradually.Plots of relevant molecule concentrations in our system when the concentration of ligand grows exponentially with l0 = 1000 and k = 0.1. The concentration of bound ligand (shown in red) quickly hits saturation, which causes a minimum in phosphorylated CheY (and therefore a low tumbling frequency). To respond, the cell increases the methylation of receptors, which boosts the concentration of phosphorylated CheY back to equilibrium.Our next question is what happens as we change k, the growth rate of the ligand concentration. The following figure shows the results of multiple simulations in which we vary the growth parameter k and plot the concentration of phosphorylated CheY over time. The larger the value of k, the faster the increase in receptor binding, and the steeper the drop in the concentration of phosphorylated CheY.Plots of phosphorylated CheY for different growth rates k of the concentration of ligand. The larger the value of k, the steeper the initial drop as the concentration of bound ligand becomes saturated, and the faster that the concentration of phosphorylated CheY returns to equilibrium.More importantly, the above figure further illustrates the robustness of bacterial chemotaxis to the rate of growth in ligand concentration. Whether the growth of the attractant is slow or fast, methylation will always bring the cell back to the same equilibrium concentration of phosphorylated CheY, and therefore the same background tumbling frequency.Reversing the attractant gradientAnd what if a cell is moving away from an attractant, down a concentration gradient? We would hope that the cell would be able to increase its tumbling frequency (i.e., increase the concentration of phosphorylated CheY), and then restore the background tumbling frequency by removing methylation.To simulate a decreasing gradient, we will model a cell in a high ligand concentration that is already at steady-state, meaning that methylation is also elevated. In this case, the ligand concentration will decay exponentially, meaning that the ligand concentration is still given by the equation [L] = l0 · ek · t, but k is negative.STOP: If k is negative, what happens to the plot of [L] = l0 · ek · t for decreasing values of k? How do you think the value of k will affect the concentration of phosphorylated CheY over time?You may like to modify the previous tutorial on your own to account for traveling down an attractant gradient. If not, we are still happy to provide a separate tutorial below.Visit tutorialSteady-state tumbling frequency remains robust when traveling down an attractant gradientThe following figure shows the plot of molecules in our model as the concentration of attractant ligand decreases exponentially with l0 = 107 and k equal to -0.3. As the ligand concentration decreases, the concentration of bound ligands plummet as bound ligands dissociate and there are not enough free ligands to replace the dissociating ones. In the absence of ligand-receptor binding, CheY can readily phosphorylate, causing a spike in phosphorylated CheY. Demethylation of receptors then causes the concentration of phosphorylated CheY to steadily return back to its equilibrium.Simulating a bacterium traveling down an attractant gradient with l0 = 107 and k equal to -0.3. Phosphorylated CheY follows the opposite pattern to traveling up an attractant gradient, with the concentration of phosphorylated CheY rising quickly only to slowly decrease to equilibrium due to demethylation.To be thorough, we should also test the robustness of our model to see whether the CheY concentration will return to the same steady-state for a variety of values of k when k is negative. As in the case of an increasing gradient, the figure below shows that the more sudden the change in the concentration of attractant, the sharper the spike. And yet regardless of the value of k, methylation does its work to bring the concentration back to the same steady-state. More importantly, this figure and the one above are confirmed by experimental observations.1Varying values of k in our exponential decrease in the concentration of attractant ligand produce the same equilibrium concentration of phosphorylated CheY. The smaller the value of k, the steeper the initial spike, and the faster the recovery to steady state.From changing tumbling frequencies to an exploration algorithmWe hope that in exploring this module, you have gained an appreciation for the elegant mechanism of bacterial chemotaxis, as well as the power of BioNetGen’s rule-based modeling for simulating a complex biochemical system without the need to keep track of individual particles.And yet we have made a major omission. E. coli goes to great lengths to ensure that if it detects a relative increase in concentration (i.e., an attractant gradient), then it can reduce its tumbling frequency in response. But what we have not explored is why this change in the bacterium’s tumbling frequency would help it find food.After all, the direction that a bacterium is moving at any point in time is random! So why would a decrease in tumbling frequency help E. coli move toward an attractant?This question is a biologically deep one and it has no intuitive answer. However, in this module’s final lesson, we will build a model to explain why E. coli’s random-walk algorithm with a variation in tumbling frequency is in fact an extremely clever way of locating resources in a strange new land.Next lesson Krembel A., Colin R., Sourijik V. 2015. Importance of multiple methylation sites in Escherichia coli chemotaxis. Available online ↩ "
} ,
{
"title" : "Exercises",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_moreexercise",
"date" : "",
"content" : "How to calculate steady state concentration in a reversible bimolecular reaction?Earlier in this chapter we learned how to calculate equilibrium concentrations of a reversible bimolecular reaction. It’s time to get some exercise.Exercise 1: How would the concentration of molecules change before and after the system reaches the steady state?Exercise 2: We have three types of molecules in the system, A, B, and AB. The molecules could form a complex via reaction A + B → AB, and the complexes could dissociate via reaction AB → A + B. Assume we know that kbind = 3, kdissociate = 3. Currently, the concentrations of each type of molecules are [A] = 95, [B] = 95, [AB] = 5. If we allow the system to continue to react, what are the concentrations of each type of molecules at the steady state?Exercise 3: Now assume we are at the equilibrium state and add additional 100 A molecules. What reactions would happen to the system and how would equilibrium concentrations change? What if kdissociate = 3 becomes 9 (without the additional 100 A molecules)? Verify your predictions with calculation.How to simulate a reaction step with the Gillespie algorithm?We learned that Poisson distribution, exponential distribution, and Gillespie algorithm are behind the BioNetGen simulation. Let’s try to simulate step by step in this way.Exercise 1: We are interested in the “wait time” between individual reactions. Will wait time be longer or shorter if we have more molecules in the system?Exercise 2: In the chapter we used counting customers entering the store as an example. Now we will think about a chemical system instead. Say that you are looking at a flask, and you have noticed that on average 100 reactions happen per second. What is the probability that exactly 100 reaction happen in the next second? Now you would like to see how long does it take for the next reaction to occur. How long would you expect to wait? What is the probability that the first reaction occur after 0.02 second? Hint: What is the λ in your system? What is the mean value of an exponential distribution?Exercise 3: Now let’s consider a reaction in a very simplified bimolecular reaction system. There are two types of molecules, the ligand L, and the rececptor T. The reaction rate constant for binding is kbind = 1(molecule • s)-1, and for dissociation is kdissociate = 2s-1. Initially, the system contains 10 L molecules and 10 T molecules, and there is no LT present yet. How long would you expect to wait before the first reaction occurs? Is it possible that the first reaction occur after 0.1s? What is your first reaction and what molecules are present in the system after your first reaction? We continue to observe the reaction system after the first reaction occurs. What reactions are possible in the system now? How long would you expect to wait before the next reaction occurs? What are the probabilities of each possible reaction?Options other than run-and-tumble?Run-and-tumble is the most studied mechanism of bacterial chemotaxis strategy. However, microorganisms have different sizes, shapes, evolutionary histories, metabolisms, and live in a very diverse range of environments. When we shift our focus to other microorganisms in other environments, run-and-tumble may not be optimal, and many other strategies exist 1.Exercise 1: What would work better for the organism if: the organism is very small even compared to an E. coli; the organism moves extremely slowly; the organisms could sense the gradient along its body?An overview of different chemotaxis strategies. The lighter to darker grayscale gradient indicates lower to higher concentrations. The solid lines indicate the trajectories of the organisms. Screenshot from Mitchell 20061.The (f) strategy above is called helical klinotaxis, and is often used by larger aquatic microorganisms. One example is dinoflagellates, a single-celled eukaryote found in marine or freshwater habitats (also responsible for the bioluminescent waves at San Diego). Dinoflagellates move along the helical trajectories, while the net direction of movement is along the increasing concentration of attractants. As illustarted in the figure, the rotation of the trailing flagelli produces an angular velocity ω1, and the rotation of the transversal flagelli produces an angular velocity ω2. The net direction of the trajectory is determined by ω1/ω2.Dinoflagellate helical klinotaxis details. Screenshot from Fenchel 20012.Dinoflagellate helical klinotaxis trajectories. Screenshot from Fenchel 20012.Exercise 2: What benefit might be associated with this kind of movement?Exercise 3: On a high level, how to model this strategy? Michell J.G., Kogure K. 2005. Bacterial motility: links to the environment and a driving force for microbial physics. FEMS Microbial Ecol 55:3-16. Available online ↩ ↩2 Fenchel T. 2001. How dinoflagellates swim. Protist 152(4):329-338. Available online ↩ ↩2 "
} ,
{
"title" : "Signaling and Ligand-Receptor Dynamics",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_signal",
"date" : "",
"content" : "Cells can detect signals via bonding to receptor proteinsChemotaxis is one example of many ways in which an organism must be able to perceive a change in its environment and react accordingly. This response is governed by a process called signal transduction, in which a cell identifies a stimulus outside the cell and then transmits this stimulus into the cell in order to effect a response.Although we did not focus on the details at that time, we have already seen an example of signal transduction when we discussed the activation of transcription factors in the previous module. When a certain type of molecule’s extracellular concentration increases, receptor proteins on the outside of the cell have more frequent bonding with these molecules and are therefore able to detect changes in molecular concentration. This “signal” is then “transduced” via a series of internal chemical processes that changes a transcription factor into an active state.In the case of chemotaxis, E. coli has receptor proteins that detect attractants such as glucose by binding to and forming a complex with these attractant ligands. The cell also contains receptors to detect repellents, but in this module, we will focus primarily on attractants.In this lesson, we will discuss how the bacterium is able to detect this molecular signal; in the next lesson, we focus on how the bacterium can convert the detected signal into an internal sequence of reactions that lead to a change in movement. See the figure below for a high-level overview of this process.An overview of the signaling pathway of chemotaxis. The red circles represent attractant ligands(L). When ligands bind to receptors, this signal is transduced via a series of enzymes, and it finally influences the rotation direction of a flagellum. We will discuss how this response is achieved in a later lesson.Modeling ligand-receptor dynamicsAlthough E. coli has different types of surface receptors that can sense a variety of different attractant/repellent ligands in its environment, we will focus on how to model the binding of a single type of receptor to a single type of attractant ligand.The chemical reactions that we have considered earlier in this course are irreversible, meaning they can only proceed in one direction. For example, in the prologue’s reaction-diffusion model for Turing patterns, we had the reaction A + 2B → 3B, which we conceptualized as two predators eating a prey and reproducing. But we did not have the reverse reaction 3B → A + 2B.To model ligand-receptor dynamics, we will use a reversible reaction that proceeds continuously in both directions at possibly different rates. If a ligand collides with a receptor, then there is some probability that the two molecules will bond into a complex. But at the same time, in any unit of time, there is also some probability that a bound receptor-ligand complex will dissociate into two separate molecules. In a future module, we will discuss the biochemical details underlying what makes two molecules more or less likely to bond, but for now, we assert that the more suited a receptor is to a ligand, the higher the bonding rate and the lower the dissociation rate.Why should ligand-receptor bonding be reversible? First, surface receptors are typically complicated molecules, and it would be costly to an organism if it needed to keep manufacturing surface receptors rather than sometimes releasing bound ligands. Second, if complexes did not dissociate, then a brief increase in ligand concentration would be detected by an organism indefinitely. We will say more about how the cell responds to a changing concentration of ligand soon.For now, we will start building a model of ligand-receptor dynamics. We denote the receptor molecule by T, the ligand molecule by L, and the bound complex as TL. We have the forward reaction T + L → TL, which takes place at some rate kbind, and the reverse reaction TL → T + L, which takes place at some rate kdissociate. If we start with a free floating supply of T and L molecules, what will happen?TL will initially be formed quickly at the expense of the free-floating T and L molecules; the reverse reaction will not occur because of the lack of TL complexes. As the concentration of TL grows and the concentrations of T and L decrease, the rate of increase in TL will slow. Eventually, the number of TL complexes being formed by the forward reaction will balance the number of TL complexes being split apart by the reverse reaction. At this point, called a steady state or equilibrium, the concentration of all particles will stabilize.Calculation of steady state concentration in a reversible ligand-receptor reactionIn fact, we can calculate the steady state concentrations of T and L for our reversible reaction by hand. Suppose that we begin with initial concentrations of T and L that are represented by t0 and l0, respectively. Let [L], [T], and [LT] denote the concentrations of the three molecule types. And assume that the reaction rate constants kbind and kdissociate are fixed.Our goal is to find the steady state concentration of LT. When this occurs, we know that the rate of production of LT is equal to the rate of its dissociation; in other words, we know thatkbind · [L] · [T] = kdissociate · [LT].We also know that by the law of conservation of mass, the concentrations of L and T molecules are always constant across the system. In particular, the number of these particles is equal to their initial concentrations. That is, at any time point, we have that[L] + [LT] = l0and that[T] + [LT] = t0.Solving these equations for [L] and [T] yields the following two equations:[L] = l0 - [LT][T] = t0 - [LT]We will now substitute the expressions on the right for [L] and [T] into our original steady state equation:kbind · (l0 - [LT]) · (t0 - [LT]) = kdissociate · [LT]Expanding the left side of this equation gives us the following updated equation:kbind · [LT]2 - (kbind · l0 + kbind · t0) · [LT] + = kdissociate · [LT] + kbind · l0 · t0Finally, we subtract the right side of this equation from both sides:kbind · [LT]2 - (kbind · l0 + kbind · t0 + kdissociate) · [LT] + kbind · l0 · t0 = 0This equation may look daunting, but most of its components are constants. In fact, the only unknown is [LT], which makes this a quadratic equation, or an equation of the form a · x2 + b · x + c = 0 for constants a, b, and c and a single unknown x. For this quadratic equation, we have the constants a = kbind, b = - (kbind · l0 + kbind · t0 + kdissociate), and c = kbind · l0 · t0.The quadratic formula — which you may have thought you would never use again — tells us that the equation a · x2 + b · x + c = 0 has solutions for x given by the following equation:\[x = \dfrac{-b \pm \sqrt{b^2 - 4 \cdot a \cdot c}}{2 \cdot a}\]STOP: Use the quadratic formula to solve for [LT] in our previous equation and find the steady state concentration of LT. How can we use this solution to find the steady state concentrations of L and T as well?Now that we have reduced the computation of the steady state concentration of LT to the solution of a quadratic equation, let’s compute this steady state concentration for a sample collection of parameters. We will then change the parameters and see how the steady state concentration changes.Say that we are given the following parameter values (the units of these parameters are not important for this toy example): kbind = 2; kdissociate = 5; l0 = 50; t0 = 50.Substituting these values into the quadratic equation, we obtain the following: a = kbind = 2 b = - (kbind · l0 + kbind · t0 + kdissociate) = -205 c = kbind · l0 · t0 = 5000That is, we are solving the equation 2 · [LT]2 - 205 · [LT] + 5000 = 0. Using the quadratic formula to solve for [LT] gives\([LT] = \dfrac{205 \pm \sqrt{205^2 - 4 \cdot 2 \cdot 5000}}{2 \cdot 2} = 51.25 \pm 11.25\).It would seem that there are two solutions: 51.25 + 11.25 = 62.5 and 51.25 - 11.25 = 40. However, because l0 and t0, the respective initial concentrations of L and T, are both equal to 50, we cannot have that the steady state concentration of LT is 62.5; as a result, it must be 40.Now that we know the steady state concentration of LT, we can recover the values of [L] and [T] too:[L] = l0 - [LT] = 10[T] = t0 - [LT] = 10What if the forward reaction were slower? We would imagine that the equilibrium concentration of LT would decrease, since the reverse reaction will occur faster than the forward reaction. For example, if we change k to 1, then we obtain the following adjusted parameter values: a = kbind = 1 b = - (kbind · l0 + kbind · t0 + kdissociate) = -105 c = kbind · l0 · t0 = 2500 In this case, if we solve for [LT], we obtain [LT] = 36.492; the steady state concentration has decreased as anticipated.STOP: What do you think will happen to the steady state concentration of LT if the initial concentration (l0) increases or decreases? What if the dissociation rate (kdissociate) increases or decreases? Confirm your prediction by changing the parameters above and solving the quadratic formula for [LT].Steady state ligand-receptor concentrations for an experimentally verified exampleLet’s use our formula to show how we could determine the steady state concentration of bound receptor-ligand complexes using values obtained from experimental results. We will model an E. coli cell with 7,000 receptor molecules in an environment containing 10,000 ligand molecules. The experimentally verified bonding rate is kbind = 0.0146((molecules/µm3)-1)s-1, and the dissociation rate constant is kdissociate = 35s-1.123As an aside, we note that if you find the above units confusing, you are not alone. To clarify these units, consider that the concentration of a particle will be measured in (molecules/µm3), or number of molecules per unit volume. So when we multiply the bonding rate by the concentrations of L and T particles, then the units become((molecules/µm3)-1)s-1 · (molecules/µm3) · (molecules/µm3) = (molecules/µm3)s-1That is, the resulting units are in molecules/µm3 per second, which corresponds to the rate at which the concentration of LT complexes is increasing.On the other hand, when LT complexes dissociate, we multiply the dissociation constant by the units of LT concentration and obtain the same units as before:(s-1) · (molecules/µm3) = (molecules/µm3)s-1.For these parameters, we obtain the following constants a, b, c in the quadratic equation: a = kbind = 0.0146 b = - (kbind · l0 + kbind · t0 + kdissociate) = -283.2 c = kbind · l0 · t0 = 1022000When we solve for [LT] in the quadratic equation, we obtain [LT] = 4793. Now that we have this value along with l0 and t0, we can solve for [L] and [T] as well:[L] = l0 - [LT] = 5207[T] = t0 - [LT] = 2207Next lesson Li M, Hazelbauer GL. 2004. Cellular stoichimetry of the components of the chemotaxis signaling complex. Journal of Bacteriology. Available online ↩ Spiro PA, Parkinson JS, and Othmer H. 1997. A model of excitation and adaptation in bacterial chemotaxis. Biochemistry 94:7263-7268. Available online. ↩ Stock J, Lukat GS. 1991. Intracellular signal transduction networks. Annual Review of Biophysics and Biophysical Chemistry. Available online ↩ "
} ,
{
"title" : "Stochastic simulation of multiple chemical reactions in a well mixed environment",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_signalpart2",
"date" : "",
"content" : "Verifying a theoretical steady state concentration via stochastic simulationIn the previous module, we saw that we could avoid keeping track of the positions of individual diffusing particles in a simulation if we assume that these particles are well-mixed, i.e., uniformly distributed throughout their environment. The E. coli cell is so small that we will assume that the concentration of any particle in its immediate surroundings is uniform. Therefore, as a proof of concept, let us see if a well-mixed simulation replicates the steady state concentrations of particles that we just found.Even though we can calculate steady state concentrations by hand, we will find a particle-free simulation useful for two reasons. First, this simulation will give us snapshots of the concentrations of particles in the system over multiple time points and allow us to see how quickly the concentrations reach equilibrium. Second, we will soon expand our model of chemotaxis to have many particles and reactions that depend on each other, and direct mathematical analysis of the system like what we have done in the previous lesson will not just be tedious; it will quickly become impossible as the number of particles and reactions grows.The difficulty at hand is comparable to the famed “n-body problem” in physics. Predicting the motions of two celestial objects interacting due to gravity can be done exactly, but there is no known such solution once we add more bodies to the system.Our particle-free model will apply an approach called Gillespie’s Stochastic Simulation Algorithm, which is often called the Gillespie algorithm or just SSA for short. Before we explain how this algorithm works, we take a short detour to provide some needed probabilistic context.The Poisson and exponential distributionsSay that you own a store and have noticed that on average, there are λ customers entering your store in a single hour. Let X denote the number of customers that enter the store in the next hour; X is an example of a random variable because it may change based on random chance. If we assume that customers are independent actors and that two customers cannot arrive at the exact same time, then X follows a distribution called a Poisson distribution; it can be shown that for a Poisson distribution, the probability that exactly n customers arrive in the next hour is\[\mathrm{Pr}(X = n) = \dfrac{\lambda^n e^{-\lambda}}{n!}\,.\]A derivation of this formula is beyond the scope of our work here, but if you are interested in one, please consider this post by Andrew Chamberlain.Furthermore, the probability of observing exactly n customers in t hours where t is an arbitrary positive number is\[\dfrac{(\lambda t)^n e^{-\lambda t}}{n!}\,.\]We can also ask how long we will typically have to wait for the next customer to arrive. Specifically, what are the chances that this customer will arrive after t hours? If we let T be the random variable corresponding to the wait time on the next customer, then the probability of T being at least t is the probability of seeing zero customers in t hours:\[\mathrm{Pr}(T > t) = \mathrm{Pr}(X = 0) = \dfrac{(\lambda t)^0 e^{-\lambda t}}{0!} = e^{-\lambda t}\,.\]In other words, the probability \(\mathrm{Pr}(T > t)\) decays exponentially over time as t increases. For this reason, the random variable T is said to follow an exponential distribution. It can be shown that the mean value of the exponential distribution (i.e., the average amount of time we will need to wait for the next event to occur) is 1/λ.STOP: What is the probability Pr(T < t)?An overview of the Gillespie algorithmThe engine of the Gillespie algorithm runs on a single question: given a well-mixed environment of particles and a reaction involving those particles taking place at some average rate, how long should we expect to wait before this reaction occurs somewhere in the environment?This is the same question we asked in the previous section; we have simply replaced customers entering a store with chemical reactions. Therefore, an exponential distribution can be used to model the “wait time” between individual reactions. The more reactions we have, and the faster these reactions occur, the larger the value of λ, meaning that we typically do not have to wait very long for the next reaction.Numerical methods exist that allow us to generate a random number simulating the wait time of an exponential distribution. By repeatedly sampling from the exponential distribution, we obtain a collection of varying wait times between consecutive occurrences of the reaction.Once a wait time is selected, we must determine the reaction to which this event corresponds. If the rates of the reactions are all equal, then this is an easy problem; we simply choose one of the reactions with equal probability. But if the rates of these reactions are different, then we should choose one of the reactions via a probability that is weighted in direct proportion to the rate of the reaction; that is, the larger the rate of the reaction, the more likely that this reaction corresponds to the current event.1We will illustrate the Gillespie algorithm by returning to our ongoing example, in which we are modeling the forward and reverse reactions of ligand-receptor binding and dissociation, respectively. First, a wait time is chosen according to an exponential distribution with mean value 1/(kbind + kdissociate); that is, λ is equal to the sum of reaction rates kbind + kdissociate. The probability that the event corresponds to a binding reaction is given byPr(L + T → LT) = kbind/(kbind + kdissociate)and the probability that the event corresponds to a dissociation reaction isPr(LT → L + T) = kdissociate/(kbind + kdissociate)STOP: Verify that these two probabilities sum to 1.The process of selecting a reaction is visualized in the figure below.A visualization of a single reaction event used by the Gillespie algorithm for ligand-receptor binding/dissociation. Red circles represent ligands (L), and orange wedges represent receptors (T). The wait time for the next reaction is drawn from an exponential distribution with mean 1/(kbind + kdissociate). The probability of this event corresponding to a binding or dissociation reaction is proportional to the rate of the respective reaction.Specifying ligand-receptor binding with a single BioNetGen ruleThroughout this module, we will employ BioNetGen to build particle-free simulations of chemotaxis applying the Gillespie algorithm.We will have two molecules corresponding to the ligand and receptor L and T that we call L(t) and T(l), respectively. The (t) specifies that molecule L contains a binding site with T, and the (l) specifies a component binding to L. We will use these components later when specifying reactions. We do not have to use t and l for this purpose, but it will make our model easier to understand.BioNetGen reaction rules are written similarly to chemical equations. The left side of the rule includes the reactants, which are followed by a unidirectional or bidirectional arrow, indicating the direction of the reaction, and the right side of the rule includes the products. After the reaction we indicate the rate constant of reaction; if the reaction is bi-directional, then we separate the forward and backward reaction rate constants with a comma.For example, to code up the bi-directional reaction A + B <-> C with forward rate k1 and reverse rate k2, we would write A + B <-> C k1, k2.Our model consists of a single bidirectional reaction and will have only a single rule. The left side of this rule will be L(t) + T(l); by specifying L(t) and T(l), we indicate to BioNetGen that we are only interested in unbound ligand and receptor molecules. If we had wanted to select any ligand molecule, then we would have simply written L + T.On the right side of the rule, we will have L(t!1).T(l!1), which indicates the formation of the intermediate. In BioNetGen, ! indicates formation of a bond; and a unique character specifies the possible location of this bond. In our case, we use the character 1, so that the bond is represented by !1. The symbol . is used to indicate that the two molecules are joined into a complex.Since the reaction is bidirectional, we will use k_lr_bind and k_lr_dis to denote the rates of the forward and reverse reactions, respectively. (We will specify values for these parameters later.)As a result, this reaction is shown below. We name our rule specifying the ligand-receptor reaction LR.LR: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_disThe following tutorial shows how to implement this rule in BioNetGen and use the Gillespie algorithm to determine the equilibrium of a reversible ligand-receptor binding reaction.Visit tutorialDoes a simulation confirm our steady state calculations?We previously showed a worked example in which a system with 10,000 free ligand molecules and 7,000 free receptor molecules produced the following steady state concentrations using bonding rates of kbind = 0.0146((molecules/µm3)-1)s-1 and kdissociate = 35s-1. [LT] = 4793 [L] = 5207 [T] = 2207The BioNetGen model covered in the previous tutorial uses the same number of initial molecules and the same reaction rates. The system evolves via the Gillespie algorithm, and we track the concentration of free ligand molecules, ligand molecules bound to receptor molecules, and free receptor molecules over time. Our goal is to see whether the concentrations reach a steady-state, and whether the steady-state matches our calculation.The figure below demonstrates that the Gillespie algorithm quickly converges to the same values as the ones that we obtained by hand in the last lesson. As a result, we can see the power of using a particle-free stochastic simulator to quickly obtain a result without needing to perform any mathematical calculations.A concentration plot over time for ligand-receptor dynamics via a BioNetGen simulation employing the Gillespie algorithm. The concentrations reach a steady state at the end of the simulation that matches the concentrations identified by hand.Yet this simple ligand-receptor model is just the beginning of our study of chemotaxis. In the next section, we will delve into the complex biochemical details of chemotaxis. Furthermore, we will see that the Gillespie algorithm for stochastic simulations will scale easily as our model of this system grows more complex.Next lesson Schwartz R. Biological Modeling and Simulaton: A Survey of Practical Models, Algorithms, and Numerical Methods. Chapter 17.2. ↩ "
} ,
{
"title" : "Bacterial Runs and Tumbles",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_walk",
"date" : "",
"content" : "E. coli explores its world via a random walkAn E. coli cell has between five and twelve flagella distributed on its surface.1 Each flagellum can rotate both clockwise and counter-clockwise. When all of the flagella are rotating counter-clockwise, they form a bundle and propel the cell forward at a speed of about 20 µm per second. This speed may seem small, but it is about ten times the length of the cell per second, analogous to a car traveling at 160 kph (100 mph). When any flagellum rotates clockwise, the flagella become uncoordinated, and the bacterium stops and rotates in place.2When we multi-cellular beings examine the bacterium’s movement, we see it alternate between periods of “running” in a straight line and then “tumbling” in place (see figure below). Over time, the bacterium takes what appears to be a random walk through its environment. Note that this run and tumble view of E. coli movement is similar to the exploration approach used by the Lost Immortals in the introduction.The run and tumble mechanism of bacterial movement produces a random walk. Image from Parkinson Lab.Tumbling frequency is constant across speciesIn the absence of an attractant or repellent, E. coli stops to tumble once every 1 to 1.5 seconds.34 And it is not alone in this behavior; bacteria living in environments with similar resource distributions adopt similar movements. Salmonella tumbles once every second5, Enterococcus sacchrolyticus tumbles once every 1.2 seconds, Bacillus subtilis tumbles once every 2 seconds6, and Rhizobia tumbles once every 1-2 seconds7. Researchers have investigated why different bacteria have different tumbling frequencies,89 but a definitive explanation for the variation in these frequencies has not been proposed.Bacteria are amazingly diverse. They have evolved for over three billion years to thrive in practically every environment on the planet, including hazardous human-made environments. They manufacture compounds like antibiotics that larger organisms like ourselves cannot make. Some eukaryotes are even completely dependent upon bacteria to perform some critical task for them, from digesting their food, to camouflaging them from predators, to helping them develop organs10.And yet despite the diversity present within the bacterial kingdom, the variations in bacterial tumbling frequencies are relatively small. Is there some reason why, regardless of the species, a bacterium’s tumbling frequency tends to hover at around one tumble every second or two? It is as if there were some invisible force compelling all of these bacteria to tumble with the same frequency.This question is a fundamental one, and we will return to it at the close of this module after we have learned more about the biochemical basis of chemotaxis and how a bacterium can adjust its behavior in response to a chemical substance. In the process, we will see that despite bacteria being simple organisms, the mechanism they use to implement chemotaxis is far more sophisticated than we might ever imagine.STOP: Say that a bacterium travels 20 µm in a randomly selected direction every second. After an hour, approximately how far will it have traveled on average? What if we allow the bacterium to travel for a week? (Hint: recall the Random Walk Theorem from the prologue.)Next lesson Sim M, Koirala S, Picton D, Strahl H, Hoskisson PA, Rao CV, Gillespie CS, Aldridge PD. 2017. Growth rate control of flagellar assembly in Escherichia coli strain RP437. Scientific Reports 7:41189. Available online ↩ Baker MD, Wolanin PM, Stock JB. 2005. Signal transduction in bacterial chemotaxis. BioEssays 28:9-22. Available online ↩ Weis RM, Koshland DE. 1990. Chemotaxis in Escherichia coli proceeds efficiently from different initial tumble frequencies. Journal of Bacteriology 172:2. Available online ↩ Berg HC. 2000. Motile behavior of bacteria. Physics today 53(1):24. Available online ↩ Achouri S, Wright JA, Evans L, Macleod C, Fraser G, Cicuta P, Bryant CE. 2015. The frequency and duration of Salmonella macrophage adhesion events determines infection efficiency. Philosophical transactions B 370(1661). Available online ↩ Turner L, Ping L, Neubauer M, Berg HC. 2016. Visualizing flagella while tracking bacteria. Biophysical Journal 111(3):630–639.Available online ↩ Gotz R and Schmitt R. 1987. Rhizobium meliloti swims by unidirectional, intermittent rotation of right-handed flagellar helices. J Bacteriol 169: 3146–3150. Avaialbe online ↩ Rashid S, Long Z, Singh S, Kohram M, Vashistha H, Navlakha S, Salman H, Oltvai ZH, Bar-Joseph Z. 2019. Adjustment in tumbling rates improves bacterial chemotaxis on obstacle-laden terrains. PNAS 116(24):11770-11775. Available online ↩ Mitchell JG, Kogure K. 2005. Bacterial motility: links to the environment and a driving force for microbial physics. FEMS Microbiol Ecol 55(2006):3–16. Available online ↩ Ed Yong. I Contain Multitudes: The Microbes Within Us and a Grander View of Life. ↩ "
} ,
{
"title" : "Homology Modeling for Protein Structure Prediction",
"category" : "",
"tags" : "",
"url" : "/coronavirus/homology",
"date" : "",
"content" : "Homology modeling uses an existing structure to reduce the search spaceIn the previous lesson, we saw that ab initio structure prediction of a long protein like the SARS-CoV-2 spike protein can be time consuming and error prone. As we mentioned in the introduction to structure prediction, however, researchers have entered over 160,000 structure entries into the PDB. With every new structure that we identify, we gain a little more information about nature’s magic protein folding algorithm. Our goal is to use the information contained in known structures to help us predict the shape of proteins with unknown structure.One of the many PDB entries is the structure of the SARS-CoV spike protein, published in 2003 at the time of the first SARS outbreak. Researchers found that the sequence of this protein is 96% similar to the sequence of the SARS-CoV-2 spike protein. We mentioned earlier in this module that proteins serving the same purpose, called homologous proteins, may have very similar structures even if they have acquired significant mutations.Assuming that the structure of the two coronavirus spike proteins is similar, we will use the structure of the SARS-CoV spike protein as a guide when assembling the SARS-CoV-2 spike protein. In other words, if the search space of all conformations of the SARS-CoV-2 spike protein is enormous, why not reduce the runtime of our algorithms — and improve accuracy — by restricting the search space to the collection of structures that are similar to the shape of the SARS-CoV spike protein?This idea serves as the foundation of homology modeling for protein structure prediction (also called comparative modeling). By using the known protein structure of a homologous protein as a template, we can in theory improve the accuracy of protein structure prediction.How does homology modeling work?In the case of the SARS-CoV-2 spike protein, we already know that we want to use the SARS-CoV spike protein as a template. However, if we do not know which template to use before we begin, then we can use a standard approach for searching a protein sequence against a database, such as BLAST.Once we have obtained a template structure that we want to use as a guide for prediction of our given protein’s structure, we need to use the information provided by the template to determine the structure of our protein. Even very similar species will have slight differences in the structures of homologous proteins, and so it will not suffice to simply report the existing structure as the structure of our protein.One way to perform homology modeling is to include an extra “similarity term” in our energy function accounting for similarity to the template structure. That is, the more similar that a candidate structure is to the template, the more negative the contribution of this similarity term; you might like to think of the template protein as “pulling down” nearby structures in the search space.Another way to perform homology modeling is to account for variance in similarity across regions of the two proteins. For example, even though the SARS-CoV and SARS-CoV-2 genomes are 96% similar, this does not mean that the differences between these two genomes are uniformly spaced throughout the genome. When we look at genomes from related species, we expect to see conserved regions where the species are very similar and other variable regions where the species are more different than the average. For example, the spike proteins of SARS-CoV and SARS-CoV-2 are only 76% similar.The phenomenon of conserved and variable regions even occurs within individual genes. For example, as the following figure shows that within a spike protein subunit, the S2 domain is 90% similar between the two viruses, whereas the S1 domain is only 64% similar. Note that there are subregions of greater or less variability within each of the two domains!Variable and conserved regions in the SARS-CoV and SARS-CoV-2 spike proteins. The S1 domain tends to be more variable, whereas the S2 domain is more conserved (and even has a small region of 100% similarity). In this figure, “NTD” stands for “N-terminal domain” and “RBD” stands for “receptor binding domain”, two subunits of the S1 domain. Source: Jaimes et al. 20201.Some algorithms account for variable and conserved regions in homology modeling by assuming that very conserved regions in the two genes correspond to essentially identical structures in the proteins; that is, the structure of our novel protein in these regions will be the same as those of the template protein. We can then use fragment libraries, or known substructures from a variety of proteins, to fill in the non-conserved regions and produce a final 3-D structure. This approach to homology modeling is called fragment assembly.In the following tutorial, we will use model the SARS-CoV-2 spike protein using different homology modeling software from three publicly available servers (SWISS-MODEL, Robetta, and GalaxyWEB), all of which apply a variant of the fragment assembly approach. Using three different homology approaches should give us confidence that if the results are similar, then our structure prediction is reasonably robust (a concept that has recurred throughout this course). Furthermore, comparing the results of multiple different approaches may give more insights into structure prediction.Visit tutorialApplying homology modeling to the SARS-CoV-2 spike proteinIf you did not follow the above tutorial, then the results of the three software resources for predicting the structure of the SARS-CoV-2 spike protein are available for download below. Structure Prediction Server Results SWISS-MODEL (S protein) SWISS-MODEL Results Robetta (Single-Chain S protein) Robetta Results GalaxyWEB GalaxyWEB Results To compare these protein structures, we need a way to represent a protein’s tertiary structure. To do so, we store the 3-D spatial coordinates of every atom in the protein. The above three models are stored in .pdb format, which is illustrated in the figure below. Each atom in the protein is labeled according to several different pieces of information, including: the element from which the atom derives; the amino acid in which the atom is contained; the the chain on which this amino acid is found; the position of the amino acid within this chain; and the 3D coordinates (x, y, z) of the atom in angstroms (10-10 meters).A simplified diagram showing how the .pdb format encodes the 3D coordinates of every atom while labeling the identity of this atom and the chain on which it is found. Source: https://proteopedia.org/wiki/index.php/Atomic_coordinate_file.The above information is just part of the information needed to fully represent a protein structure. For example, a .pdb file will also contain information about the disulfide bonds between amino acids. For more information, check out the official PDB documentation.Now that we know a bit more about .pdb files, we ask ourselves how to compare two proteins’ structures as we transition to the next lesson. How similar are the software predictions of the SARS-CoV-2 spike protein to each other, and how similar are they to the experimentally verified structure of the SARS-CoV spike protein?Next lesson Jaimes, J. A., André, N. M., Chappie, J. S., Millet, J. K., & Whittaker, G. R. 2020. Phylogenetic Analysis and Structural Modeling of SARS-CoV-2 Spike Protein Reveals an Evolutionary Distinct and Proteolytically Sensitive Activation Loop. Journal of molecular biology, 432(10), 3309–3325. https://doi.org/10.1016/j.jmb.2020.04.009 ↩ "
} ,
{
"title" : "Biological Modeling: A Free Online Course",
"category" : "",
"tags" : "",
"url" : "/",
"date" : "",
"content" : "Welcome to our course!Welcome to our free and open course in biological modeling!In this course, we are going to build models of biological systems that are relatively simple but nevertheless provide us with some deep insights into how those systems operate.Furthermore, we will perform modeling of molecular and cellular biological systems at multiple “scales” of resolution, from the study of a single protein molecule within a cell’s cytoplasm, to observing the interaction of proteins molecules interacting with each other as a whole, to a much wider view that considers cells themselves interacting with each other. There are fascinating insights lurking at all these levels of resolution, and the goal of this course is to help learners understand some approaches that lead us to these insights.Have you ever wondered what process causes zebras to have stripes? Have you ever wondered how your cells can react to their environment and perform complex tasks without any intelligence guiding them? Have you wondered why the original SARS coronavirus did not become a pandemic in 2003 but SARS-CoV-2 has spread like wildfire through the population? Have you ever wondered how automated algorithms can be trained that can “see” cells as well as a human?These are the questions – and more! – that we will address in this course. We will see that each of them can be answered using modeling.Getting startedWe hope that we have piqued your interest! If you are ready to jump in and get started, please join us!You can get started with our prologue, or if you would like to know more about the structure of the course, check out our course contents.Course survey and testimonialsPlease use our anonymous survey so that we can track information about the demographics of our learners.If you loved our course, then we would love to hear from you as a testimonial. Please use the contact form to get in touch!For instructorsIf you are an instructor who is interested in adopting any or all of these materials in your teaching, please feel free to do so! We only ask that you: Make sure to refer your students to our website. Get in touch with us using our contact form so that we can add you to our list of adopting instructors and provide you with updates about the project in the future. We plan to build a community of instructors adopting this course. Meet the team!This course was lovingly put together by a professor and a team of superlative students in Carnegie Mellon University’s Computational Biology Department. You can meet us on our Meet the Team page.You might also enjoy…If you enjoy this course, then we would suggest some additional resources below.MMBioS Training WorkshopsIf you are a biological modeling researcher and want to learn more about how the software resources presented here can be applied to your work, please check out the workshops organized as part of the MMBioS project to which this course belongs.In 2021, these workshops are going to be held online. Please find details regarding the workshops below. Hands-on Workshop on Computational Biophysics: June 28 – July 1, 2021 Cell Modeling Workshop: July 12 - July 16, 2021 Additional open educational materials in computational biology and programmingIf you are interested in additional open educational materials, we think you would love some of the other free education projects developed by the project founder, Phillip Compeau. We list these resources below. Programming for Lovers: An ongoing open course in introductory programming aimed at science students. Bioinformatics Algorithms: An Active Learning Approach: A best-seller in its field, this textbook has been adopted by over 170 instructors in 40 countries around the world. It has also been used as the basis of the Bioinformatics Specialization on Coursera, which has reached hundreds of thousands of online learners. The text of the book is available for free on the textbook website. Rosalind: An open platform for learning bioinformatics independently through problem solving. AcknowledgementsThis online course is a dissemination effort for the National Center for Multiscale Modeling of Biological Systems (MMBioS). It is graciously supported by the National Institutes of Health (grant ID: P41 GM103712).We are also grateful to Wendy Velasquez Ebanks and Ulani Qi, who provided additional work on the course during its conception."
} ,
{
"title" : "Meet the Team",
"category" : "",
"tags" : "",
"url" : "/meet-the-team/",
"date" : "",
"content" : "Meet the Team Phillip Compeau Project Founder and Lead Phillip Compeau is an Associate Teaching Professor and the Assistant Department Head for Education in the Computational Biology Department in Carnegie Mellon University’s School of Computer Science. He directs the undergraduate program in computational biology, co-directs the Precollege Program in Computational Biology, and serves as Assistant Director of the Master’s in Computational Biology program.Phillip is passionate about open online education, and his education projects have reached hundreds of thousands of learners around the world. He is the co-author of Bioinformatics Algorithms: An Active Learning Approach, which has been adopted in over 140 institutions around the world. This textbook powers the popular Bioinformatics Specialization on Coursera. He co-founded the learning platform Rosalind for learning programming, bioinformatics, and algorithms through independent problem solving. Finally, Phillip is the founder of Programming for Lovers, an online course in introductory programming motivated by fun scientific applications. Home Page Noah Yann Lee Web Designer & Content Developer Noah Yann Lee is a PhD student at Yale University under the Computational Biology and Bioinfornatics program. Noah completed his undergraduate at Carnegie Mellon University, graduating in 2020 with a B.S. in Computational Biology with a minor in Design for Learning. From running early-childhood educational tests with the Children’s School at Carnegie Mellon for the Global Learning XPRIZE, to cultivating and sequencing phage genomes with the PhageHunters program, Noah has an appreciation for science from the micro to the macro, physical to the digital. Noah is always interested to connect with projects and organizations working with STEM, education, and science outreach Chris Lee Content Developer Chris Lee is a current graduate student at Carnegie Mellon University and is in the M.S. in Computational Biology Program. Previously, he was an undergraduate student at Rutgers University and worked as an undergraduate researcher studying hydrothermal vent bacteria. In 2019, Chris graduated magna cum laude with a B.A. in Molecular Biology & Biochemistry and double minor in Chemistry and Computer Science. He is currently interested in the fields of bioinformatics and genomics. Shuanger Li Content Developer Shuanger is an MSc student studying Computational Biology at CMU. She is interested in theories of evolution and ecology, and is currently working with Dr. Oana Carja on heritable phenotypic variability. She enjoys modeling and simulation as powerful and fun ways to understand biological systems. She double majored in Environmental Sciences and Microbial Biology at UC Berkeley, where she studied Hawaiian arthropod assemblages, spider behaviors, and remediation bioreactors. Mert Inan Content Developer Mert is currently a computer science Ph.D. student at the University of Pittsburgh. Mert is an alum of the M.S. in computational biology program at Carnegie Mellon University. He loves interdisciplinary fields and has been working at the intersection of computation, biology, neuroscience, and machine intelligence. Unlocking the secrets of biology is a pleasure that Mert truly enjoys even under quarantine conditions. Nicole Matamala Content Developer Nicole Matamala is an alum of the B.S. in computational biology program at Carnegie Mellon University. "
} ,
{
"title" : "Protein Structure Prediction",
"category" : "",
"tags" : "",
"url" : "/coronavirus/more_RMSD",
"date" : "",
"content" : "To use RMSD as a quantitative measure for comparing protein structures, the structures must first be superposed in such a way that the RMSD is minimized.Back in the tutorial, superposing was accomplished by utilizing the calcTransformation() function, which returns the optimal transformation matrix between two structures such that the RMSD is minimized. This transformation matrix is consists of translation vector and rotation matrix, which can be calculated using the Kabsch Algorithm.The source code for calcTransformation() can be foundhere.How it Works: Kabsch Algorithm (Partial Procrustes Superimposition)The Kabsch Algorithm is an algorithm that finds the optimal rotation matrix in which the RMSD between two paired sets of points is minimized (the two sets must have the same number of points). In our case, the two sets of points are the 3D coordinate points of the Cα (carbon skeleton) of the two protein structures that we want to compare. The algorithm can be broken down into three major steps: Translation, Covariance, and Singular Value Decomposition.InputThe input will be a (N x 3) matrix for each set of points, where N is the number of points per set. The three column represent the 3D coordinate set per N points.Translation StepEach set of points is translated such that their centroid lies on the origin of the coordinate system. This is easily done by subtracting the coordinates of the centroid from the respective point coordinates.Covariance StepThe next step is to calculate the cross-covariance matrix. Let H be the cross-covariance matrix, and P and Q be the two translated input matrixes such that for the result of the algorithm, P will be rotated into Q.Or in summation notation:Singluar Value Decomposition (SVD) StepIt is possible to get the optimal rotation matrix, R, with the formula:However, this is not always possible and can become quite complicated (e.g. H not having an inverse). Another method is to use singular value decomposition of the covariance matrix. Kabsch algorith utilizes SVD of H to compute R:In order to ensure that the rotation matrix is a right-handed coordinate system, the matrix may need to be corrected by calculating the determinant of the dot product of W and VT:Finally, R can be calculated with the following matrix formula:Return to main text"
} ,
{
"title" : "Conclusion: The Importance of Robustness in Biological Oscillators",
"category" : "",
"tags" : "",
"url" : "/motifs/conclusion",
"date" : "",
"content" : "The need for robustness in biological oscillatorsNothing exemplifies the need for robustness in biological systems better than oscillators. If your heart skips a beat when you are watching a horror movie, it should be able to return quickly to its natural rhythm. When you hold your breath to dive underwater, you shouldn’t hyperventilate when you return to the surface. And regardless of what functions your cells perform or what disturbances they find in their environment, they should be able to maintain a normal cell cycle.An excellent illustration of the robustness of the circadian clock is the body’s ability to handle jet lag. There is no apparent reason why humans would have evolved to be resilient to flying halfway around the world. And yet our circadian clock is so resilient that after a few days of fatigue and crankiness, we return to a normal daily cycle.In the previous lesson, we saw that the repressilator will oscillate even in a noisy environment. This behavior leads us to wonder about the extent to which the repressilator is robust. Much like the circadian clock responding to jet lag, we wonder how quickly the repressilator can respond to the jolt of a sudden disturbance in the concentrations of its particles.A coarse-grained model for the repressilatorWe have noted that a benefit of using a reaction-diffusion particle model to study network motifs is the inclusion of built-in noise to ensure a measure of robustness. However, as we saw in the prologue with our work on Turing patterns, a downside of a particle-based model is that tracking the movements of many particles leads to a slow simulation that does not scale well given more particles or reactions.Although our model is ultimately interested in molecular interactions, the conclusions we have made throughout this chapter are only based on the concentrations of these particles. Therefore, we might imagine developing a coarser-grained version of our model that allows us to make faster conclusions about particle concentrations without keeping track of the diffusion of individual particles.In the prologue, we introduced a cellular automaton for simplifying the study of Turing patterns because the model at hand was dependent upon the spatial organization of particles because particles were present at different concentration across the grid and were diffusing at different rates. In this case, we will implement an even simpler model because we can assume that the concentrations of particles are uniform.For example, say that we are modeling a degradation reaction. If we start with 10,000 X particles, then after a single time step, we will simply multiply the number of X particles by (1-r), where r is a parameter related to the rate of the degradation reaction.As for a repression reaction like X + Y → X, we can update the concentration of Y particles by subtracting some factor times the current concentration of Y particles. This factor should be directly related to the current concentrations of both X and Y.We will focus on the technical details behind such a coarse-grained “particle-free” model in the next module. In the meantime, we provide a tutorial below showing how to build a particle-free simulation replicating the repressilator motif. As part of this tutorial, we will make a major disturbance to the concentration of one of the particles and see how long the disturbance lasts and whether the particle concentrations resume their oscillations.Visit tutorialThe repressilator is robust to disturbanceIn the figure below, we show a plot of concentrations of each particle in our particle-free simulation of the repressilator, with one caveat. Midway through the simulation, we greatly increase the concentration of Y.Adding a significant number of Y particles to our simulation produces little ultimate disturbance to the concentrations of the three particles, which return to normal oscillations within a single cycle.Because of the spike in the concentration of Y, the reaction Y + Z → Y suppresses the concentration of Z for longer than usual, and so the concentration of X is free to increase for longer than normal. As a result, the next peak in the concentration of X is higher than normal.We might hypothesize that this process would continue, with a tall peak in the concentration of Z. However, the peak in the concentration of Z is no taller than normal, and the next peak shows a normal concentration of X. In other words, the system has very quickly absorbed the blow of an increase in concentration of Y and returned to normal within one cycle.Even with a much larger jolt to the concentration of Y, we observe the concentrations of the three particles return to normal oscillations very quickly (figure below).The repressilator is not the only network motif that leads to oscillations of particle concentrations, but robustness to disturbance is a shared feature of all these motifs. This having been said, the repressilator is particularly successful at stabilizing. And although there have been some attempts to study what makes oscillators robust, the process remains difficult to describe. By characterizing the number and type of interactions within the oscillator model, it has been shown that at least five reactions are typically needed to build a very robust oscillator1.The robustness of the repressilator also implies a bigger picture moral in biological modeling. If an underlying biological system demonstrates robustness to change, then any model of that system should also be able to withstand this change. Conversely, we should be wary of a model of a robust system that does not display this robustness.We have seen that even very simple network motifs can have a powerful effect on a cell’s ability to implement elegant behavior. In the next module, we will encounter a much more involved biochemical process, with far more molecules and reactions, that is used by bacteria to cleverly (and robustly) explore their environment. In fact, we will have so many particles and so many reactions that we will need to completely rethink how we set up our model. We hope that you will join us!In the meantime, check out the exercises below to continue developing your understanding of how transcription factor network motifs have evolved.Visit exercises Castillo-Hair, S. M., Villota, E. R., & Coronado, A. M. (2015). Design principles for robust oscillatory behavior. Systems and Synthetic Biology, 9(3), 125–133. https://doi.org/10.1007/s11693-015-9178-6 ↩ "
} ,
{
"title" : "Network Motifs Exercises",
"category" : "",
"tags" : "",
"url" : "/motifs/exercises",
"date" : "",
"content" : "Identifying Feed-Forward Loops and More Complex MotifsExercise 1: Modify the Jupyter notebook provided in the tutorial on loops to count the number of feed-forward loops in the transcription factor network for E. coli.There are eight types of feed-forward loops based on the eight different ways in which we can label the edges in the network with a “+” or a “-“ based on upregulation or downregulation.The eight types of feed-forward loops.1Exercise 2: Modify the Jupyter notebook to count the number of loops of each type present in the E. coli transcription factor network.Exercise 3: How many feed-forward loops would you expect to see in a random network having the same number of nodes as the E. coli transcription factor network? How does this compare to your answers to the previous two questions?More complex motifs may require more computational power to discover.Example of different motifs within the S. Cerevisiae network.2Exercise 4: Can you modify our Jupyter Notebook for motif finding to identify circular loops of transcription factor regulation, such as the multi-component loop above?Negative AutoregulationUsing the NAR_comparison_equal.blend file from the negative autoregulation tutorial, increase the reaction rate of X1 -> X1 + Y1 to 4e4, so that the table should now look like the following: Reactants Products Forward Rate X1’ X1’ + Y1’ 4e5 X2’ X2’ + Y2’ 4e2 Y1’ NULL 4e2 Y2’ NULL 4e2 Y2’ + Y2’ Y2’ 4e2 If we plot this graph, we can see the steady states of Y1 and Y2 are different once again.Exercise 1: Can you repair the system to find the appropriate reaction rate for X2 -> X2 + Y2 to make the steady states equal once more? Are you able to adjust the reaction Y2 + Y2 -> Y2 as well? Do the reaction rates scale at the same rate?Exercise 2: One way for the cell to apply stronger “brakes” to the simple regulation rate would be to simply increase the degradation rate, rather than implement negative autoregulation. Why do you think that the cell doesn’t do this?Implementing More Network MotifsExercise 1: Use the NFSim tutorial implementing the repressilator as a basis to replicate the other network motif tutorials presented in this module.Next module Image adapted from Mangan, S., & Alon, U. (2003). Structure and function of the feed-forward loop network motif. Proceedings of the National Academy of Sciences of the United States of America, 100(21), 11980–11985. https://doi.org/10.1073/pnas.2133841100 ↩ Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber, G. K., … Young, R. A. (2002). Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 298(5594), 799–804. https://doi.org/10.1126/science.1075090 ↩ "
} ,
{
"title" : "The Feedforward Loop Motif",
"category" : "",
"tags" : "",
"url" : "/motifs/feed",
"date" : "",
"content" : "Feedforward loopsIn the previous section, we saw that negative autoregulation can be used to speed up the response time of a protein to an external stimulus. The catch is that negative autoregulation can only be used if the protein is itself a transcription factor. Only about 300 out of 4,400 total E. coli proteins are transcription factors1. Is there a simple way of speeding up a cell’s ability to manufacture a protein if that protein is not a transcription factor?The answer will lie in another small network motif called the feedforward loop (FFL). The FFL motif, shown in the figure below, is a network substructure in which X is connected to both Y and Z, and Y is connected to Z. In this sense, calling the FFL motif a “loop” is a misnomer. Rather, it is a small structure in which there are two “paths” from X to Z; one via direct regulation of Z by X, and another in which there is an intermediate transcription factor Y. This is why this motif is called feedforward rather than feedback.The FFL motif. X regulates both Y and Z, and Y regulates Z.Note that X and Y must be transcription factors because they have edges leading out from them, but Z does not have to be a transcription factor (and in fact typically is not). There are 42 FFLs in the transcription factor network of E. coli2, and we will pass the verification that this is a significant number of FFLs as an exercise at the end of the module.Furthermore, recall that every edge of a transcription factor network is assigned a “+” or a “-“ sign based on whether the interaction corresponds to activation or repression, respectively. Accordingly, there are eight different types of FFLs, depending on the labels of the three edges in this motif.Among the 42 total FFLs in the E. coli transcription factor network, five of them have the structure below, in which the edges connecting X to Y and X to Z are assigned a “+” and the edge connecting Y to Z is assigned a “-“. This specific form of the FFL motif is called a type-1 incoherent feedforward loop. This form of the FFL will be our focus for the rest of the module.STOP: How could we simulate a feedforward loop with chemical reactions akin to the simulation that we used for negative autoregulation? What would we compare this simulation against?The incoherent feed-forward loop network motif. Note that X upregulates Y and Z, while Y downregulates Z.Modeling a type-1 incoherent feedforward loopAs we did in the last section, we will run two simulations. In the first, we will have a simple activation of Z by X, meaning that we will assume X is at its steady state concentration and that Z is produced by the reaction X → X + Z and removed by the reaction Z → NULL.The second simulation will include both of these reactions, but we will also have the reaction X → X + Y to model the upregulation of Y by X, along with the reaction Y + Z → Y to model the repression of Z by Y. Because Y and Z are being produced from a reaction, we will also have kill reactions for Y and Z to model the degradation of these two proteins. For the sake of fairness, we will use the same degradation rates for both Y and Z.Furthermore, in order to obtain a mathematically controlled comparison, we will need to make the reaction X → X + Z have a higher rate in the second simulation that models the FFL. If we do not raise the rate of this reaction, then the repression of Z by Y will cause the steady state concentration of Z to be lower in the second simulation.If you are feeling adventurous, then you may like to adapt the NAR tutorial to run the above two simulations and tweak the rate of the X → X + Z reaction to see if you can obtain the same steady state concentration of Z in the two simulations. We also provide the following tutorial guiding you through setting up these simulations, which we will interpret in the next section.Visit tutorialWhy feedforward loops speed up response timesThe figure below shows a plot visualizing the amount of Z across the two simulations. As with negative autoregulation, we see that the type-1 incoherent FFL allows the cell to ramp up production of a gene Z much faster than it would under simple regulation.The concentration of Z in the two simulations referenced in the main text. Simple activation of Z by X is shown in blue, and the type-1 incoherent FFL is shown in purple.However, you will note a slightly different pattern to the growth of Z than we saw under negative autoregulation. In negative autoregulation, the concentration of the protein approached steady state from below. In the case of the FFL, the concentration of Z grows so quickly that it passes its steady state and then returns to steady state from above.We can interpret from the model why the FFL allows for a fast response time as well as why it initially passes the steady state concentration. At the start of the simulation, Z is activated by X very quickly. X regulates the production of Y as well, but at a lower rate than the regulation of Z because Y only has its own degradation to slow this process. Therefore, more Z is initially produced than Y, which causes the concentration of Z to shoot past its eventual steady state.The more Y we have, and the more Z that we have, the more often the reaction Y + Z → Y will occur. Because the concentrations of both Y and Z increase over time, this reaction serves as the “brakes” for the concentration of Z. These brakes need to be very powerful, meaning that the rate of the reaction Y + Z → Y needs to be very high, in order to decrease the concentration of Z to its steady state.Damped oscillations give us hope of building a biological oscillatorThe feedforward process must be vital to the cell. Unlike negative autoregulation of a single transcription factor, the FFL requires two separate transcription factors working together in order to increase the production of our target gene. This higher evolutionary cost of implementation may help account for why it is more rare than a negatively autoregulating transcription factor.We only considered one of the eight types of FFL in this lesson. You might wonder whether any of the other seven FFL structures serve as network motifs. For example, what happens if X activates Z, X represses Y, and Y activates Z? We will explore these additional FFL structures in the exercises at the end of the module.Finally, recall the figure above, in which the concentration of Z swung past its steady state before returning to the steady state. This figure is reminiscent of a damped oscillation process in which the concentration of a particle alternates between being above and below its steady state, while the amplitude of the oscillation gets smaller and smaller.In a damped oscillation, the value of some variable (shown on the y-axis) back and forth around an asymptotic value while the amplitude of the oscillations decreases.3In a true oscillation process, the concentration of the particle is not damped, and this concentration alternates with regularity between a minimum and maximum value. Oscillations are common-place in nature and remarkable because the oscillating behavior arises from the system and can be maintained without outside influence. But can oscillations be explained by transcription factor network motifs? We hope you will join us in the next lesson to find out.Next lesson Gene ontology database with “transcription” keyword: https://www.uniprot.org/. ↩ Mangan, S., & Alon, U. (2003). Structure and function of the feed-forward loop network motif. Proceedings of the National Academy of Sciences of the United States of America, 100(21), 11980–11985. https://doi.org/10.1073/pnas.2133841100 ↩ https://www.toppr.com/guides/physics/oscillations/damped-simple-harmonic-motion/ ↩ "
} ,
{
"title" : "Using Randomness to Verify Network Motifs",
"category" : "",
"tags" : "",
"url" : "/motifs/finding",
"date" : "",
"content" : "The loop: the simplest network motifIn the previous lesson, we introduced the transcription factor network, in which a protein X is connected to a protein Y if X is a transcription factor that regulates the production of Y. We also saw that in the E. coli transcription factor network, there seemed to be a large number of loops, or edges connecting X to X that correspond to the autoregulation of X.In the introduction, we briefly introduced the notion of a network motif, or a structure occurring often throughout a network. In the remainder of this module, we discuss how to identify network motifs as well as explain why they occur so often in a network. And we will start our work by studying the loop.Using randomness to determine statistical significanceWe first need to argue rigorously that a loop is indeed a motif within a transcription factor network. To do so, we will apply a paradigm that occurs throughout computational biology (and science in general) when determining whether an observation is significant. We will compare our observation against a randomly generated database — the power of randomness strikes again!A seminal biological example of this paradigm is the search tool BLAST, which allows researchers to compare a query against a database (e.g., comparing the DNA sequence of a newly sequenced gene against a collection of many known proteins). Once BLAST finds a “hit” in which the query occurs with slight modifications within the database, it asks, “What is the probability that we would find a hit of the same quality of the query against a randomly generated ‘decoy’ database?” If this probability is low, then we can feel confident that the hit is statistically significant.STOP: How can we apply this paradigm to determine whether a transcription factor network contains a significant number of loops?Comparing a real transcription factor network against a random networkTo determine whether the number of loops in the transcription factor network of E. coli is significant, we will compare the number of loops that we find in this network against the expected number of loops we would find in a randomly generated network. If the number of loops in the real network is much higher than the number of loops in the random network, then we have strong evidence that there is some selective force causing a loop to be a network motif.There are multiple ways to generate a random network, but we will use an approach developed by Edgar Gilbert in 19591. Given an integer n and a probability p (between 0 and 1), we first form n nodes; then, for every possible pair of nodes X and Y, we connect X to Y via a directed edge with probability p.STOP: What should n and p be if we are generating a random network to compare against the E. coli transcription factor network?The full E. coli transcription factor network contains thousands of genes, most of which are not transcription factors. As a result, the approach described above may form a random network that connects non-transcription factors to other nodes, which we should avoid.Instead, we will focus on the network comprising only E. coli transcription factors that regulate each other. This network has 197 nodes and 477 edges, and so we will begin by forming a random network with n = 197 nodes.We then select p to ensure that our random network will on average have 477 edges. To do so, we note that there are n2 pairs of nodes that could have an edge connecting them (n choices for the starting node and n for the ending node). If we were to set p equal to 1/n2, then we would expect on average only to see a single edge in the random network. We therefore scale this value by 477 and set p equal to 477/n2 so that we will see, on average, 477 edges in our random network.We are now ready to build a random network and compare it against the real transcription factor network. The link below will take you to a short tutorial that includes a Jupyter notebook running this comparison and demonstrating that the number of loops in the E. coli transcription factor network is significant. You may also feel free to skip ahead to the section below, which discusses the results of this tutorial.Visit tutorialThe negative autoregulation motifIn a random network containing n nodes, the probability that a given edge is a loop is 1/n. Therefore, if the network has e edges, then we would on average see e/n loops in the network.In our case, n is 197, and e is 477; therefore, on average, we will only see approximately 2.42 loops in a random network. Yet the real E. coli network contains 130 loops!Furthermore, in a random network, we would expect about half of the edges to correspond to upregulation, and the other half to correspond to downregulation. But if you followed the tutorial linked in the previous section, then you know that of the 130 loops in the E. coli network, 35 correspond to upregulation and 95 correspond to downregulation.So, not only is autoregulation an important feature of transcription factors, but these transcription factors tend to negatively autoregulate. Why in the world would organisms have evolved autoregulation only to slow their own transcription? In the next lesson, we will begin to unravel the mystery.Next lesson Gilbert, E.N. (1959). “Random Graphs”. Annals of Mathematical Statistics. 30 (4): 1141–1144. doi:10.1214/aoms/1177706098. ↩ "
} ,
{
"title" : "Introduction: Networks Rule Biology",
"category" : "",
"tags" : "",
"url" : "/motifs/home",
"date" : "",
"content" : "by Noah Lee and Phillip CompeauIn the prologue, we worked with a particle-based model that simulated the interactions of skin cells to produce complex Turing patterns. In this module, we will zoom into a much lower biological scale and model protein interactions, which occur on a molecular level. The scale of these interactions is tiny: a protein is typically on the order of about 10nm in diameter. (For comparison, a light microscope’s highest resolution is about 2000 nm, and the diameter of a single human hair is about 100,000 nm.)In this module, we turn our attention to the question of how a cell can adjust its protein concentrations in reaction to an ever-changing environment. We will see that the mechanisms the cell uses to make these changes are rapid, robust, and elegant.We will also introduce the concept of a network, or a collection of nodes along with edges that connect pairs of nodes. Before continuing, we will take the opportunity to give a few examples of biological networks.When studying the functions and binding of proteins, biologists may build a protein-protein interaction network (figure below). Nodes correspond to proteins, and two proteins are connected with an edge if they are known to interact.A complete hepatitis C virus-host protein-protein interaction network in hepatoma cells.1 Nodes correspond to proteins, and an edge connects two proteins if the two proteins interact.When studying the more complex interactions and processes taking place within a cell, biologists form a metabolic network (figure below). Nodes correspond to substances in a chemical reaction, and an edge connects two nodes if there is some enzyme that catalyzes a reaction involving these substances.The metabolic network of tomato cells.2When studying the nervous system, biologists build neuronal networks that link neurons together according to how they are linked in the body (figure below). These networks have been studied since the 1940s but have recently exploded as a model for solving applied problems in machine learning.Mapping and models of neurons.3In this module, we will introduce yet another fundamental biological network called a transcription factor network that involves the proteins that drive a cell’s response to its environment. We will hunt for network motifs, or commonly occurring structures, hidden in this network. We will then use modeling to address the more challenging question of why these motifs have evolved in order to help the cell respond to its environment.But before we get ahead of ourselves, let us introduce some of the molecular biology fundamentals we will need to complete our analysis. As in the prologue, you may already know this biological background, in which case you should feel free to skim the next lesson.Next lesson Ramage, Holly & Kumar, Gagandeep & Verschueren, Erik & Johnson, Jeffrey & Dollen, John & Johnson, Tasha & Newton, Billy & Shah, Priya & Horner, Julie & Krogan, Nevan & Ott, Melanie. (2015). A Combined Proteomics/Genomics Approach Links Hepatitis C Virus Infection with Nonsense-Mediated mRNA Decay. Molecular cell. 57. 329-340. 10.1016/j.molcel.2014.12.028 ↩ Colombie, Sophie & Nazaret, Christine & Bénard, Camille & Biais, Benoit & Mengin, Virginie & Solé, Marion & Fouillen, Laetitia & Dieuaide‐Noubhani, Martine & Mazat, Jean-Pierre & Beauvoit, Bertrand & Gibon, Yves. (2014). Modelling central metabolic fluxes by constraint-based optimization reveals metabolic reprogramming of developing Solanum lycopersicum (tomato) fruit. The Plant Journal. 81. 10.1111/tpj.12685. ↩ An, Hongyu. (2017). Opportunities and challenges on nanoscale 3D neuromorphic computing system. 10.1109/ISEMC.2017.8077906. ↩ "
} ,
{
"title" : "The Negative Autoregulation Motif",
"category" : "",
"tags" : "",
"url" : "/motifs/nar",
"date" : "",
"content" : "Hunting for a biological motivation for negative autoregulationTheodosius Dobzhansky famously wrote that “nothing in biology makes sense except in the light of evolution.”1 In the spirit of this quotation, there must be some evolutionary reason for the presence of so many negatively autoregulating transcription factors (i.e., transcription factors that slow their own transcription). Our goal is to use biological modeling to establish this justification.Say that a transcription factor X regulates another transcription factor Y, and consider two cells. In both cells, X upregulates the transcription of Y, but in the second cell, Y also negatively autoregulates.In this lesson, we will simulate a “race” to the steady state concentration of Y in the two cells. The premise is that the cell that reaches this steady state faster is able to respond more quickly to its environment and is therefore more fit for survival.Simulating transcriptional regulation with a reaction-diffusion modelIn the prologue, we simulated chemical reactions to run a randomized particle-based model. In this lesson, we will apply the same model, in which the particles correspond to transcription factors X and Y.We will begin with a model of the first cell, in which X upregulates Y but we do not have negative autoregulation of Y. We start without any Y particles and a constant number of X particles. To simulate X upregulating the expression of Y, we add the reaction X → X + Y. This reaction ensures that in a given interval of time there is a constant underlying probability that a given X particle will spontaneously form a new Y particle.We should also account for the fact that proteins are degraded over time by enzymes called proteases. Protein degradation offers a natural mechanism by which proteins at high concentrations can return to a steady-state. To this end, we add a “kill” reaction that removes Y particles. We will assume that X starts at steady-state, meaning that X is being produced at a rate that exactly balances its degradation rate, and we will therefore not need to add reactions to the model simulating the production or degradation of X.Diffusion of the X and Y particles is not necessary because there is no reaction in which more than one particle interacts, but we will allow both X and Y particles to diffuse through the system at the same rate.STOP: What chemical reaction could be used to add negative autoregulation of Y to this simulation?We now will simulate the second cell, which will inherit the reactions for the first cell while incorporating adding negative autoregulation of Y. We will do so using the reaction 2Y → Y. In other words, when two Y particles encounter each other, there is some probability that one of the particles serves to remove the other, which mimics the process of a transcription factor turning off another copy of itself during negative autoregulation.To recap, the simulations of both cells will include diffusion of X and Y, removal of Y, and the reaction X → X + Y. The second simulation, which includes negative autoregulation of Y, will add the reaction 2Y → Y. All of these reactions will take place according to rate parameters. You can explore these simulations in the following tutorial, and we will reflect on these simulations in the next section.Visit tutorialEnsuring a mathematically controlled comparisonIf you followed the above tutorial, then you were likely disappointed in our second cell and its negative autoregulating transcription factor Y. The figure below shows a plot of Y particles for the two simulations.A comparison of the number of Y particles across two simulations. In the first cell (shown in red), we only have upregulation of Y by X, whereas in the second cell (shown in yellow), we keep all parameters fixed but add a reaction simulating negative autoregulation of Y.By allowing Y to slow its own transcription, we wound up with a simulation in which the final concentration of Y was lower than when we only had upregulation of Y by X. It seems like we are back at square one; why in the world would negative autoregulation be so common?The answer to our quandary is that the model we built was not a fair comparison between the two systems. In particular, the two simulations must be controlled so that they have approximately the same steady-state concentration of Y. Ensuring this equal footing for the two simulations is called a mathematically controlled comparison.2STOP: How can we change the parameters of our models to obtain a mathematically controlled comparison?There are a number of parameters that we must keep constant across the two simulations because they are not related to regulation: the diffusion rates of X and Y, the number of initial particles X and Y, and the degradation rate of Y.With these parameters fixed, the only way that the steady-state concentration of Y can be the same in the two simulations is if we increase the rate at which the reaction X → X + Y takes place in the second simulation. If you followed the previous tutorial, then you may like to try your hand at adjusting the rate of the X → X + Y reaction on your own. The following tutorial adjusts this parameter to build a mathematically controlled comparison that we will analyze in the next section.Visit tutorialAn evolutionary basis for negative autoregulationThe figure below plots the number of Y particles for the two simulations on the same chart over time, with the rate of the X → X + Y reaction increased in the simulation involving negative autoregulation. The two simulations now have approximately the same steady-state concentration of Y. However, the second simulation is able to reach this concentration faster; that is, its response time to the external stimulus causing the increase in regulation of Y is faster.A comparison of the number of Y particles across the same two simulations from the previous figure, with the change that in the second simulation (shown in yellow), we increase the rate of the reaction simulating upregulation of Y by X. As a result, the two simulations have approximately the same steady state of Y, and the simulation involving negative autoregulation reaches this steady state more quickly.More importantly, a justification for the evolutionary purpose of negative autoregulation presents itself. Because the rate of the reaction X → X + Y is higher in the second simulation, the number of Y particles in this simulation increases at a much faster rate. As the concentration of Y increases, the rate at which new Y particles are added to the system is the same in the two simulations because this reaction only depends on the number of X particles, which is constant. However, the rate at which Y particles are removed is higher in the second simulation because in addition to the degradation reaction Y → NULL, we have the negative autoregulation reaction 2Y → Y serving to remove Y particles. As a result, the plot of Y particles over time flattens more quickly (i.e., its derivative decreases faster) for the second simulation.More importantly, this plot helps explain why negative autoregulation may have evolved. The simulation involving negative autoregulation wins the “race” to a steady-state concentration of Y, and so we can conclude that a cell in which this transcription factor is negatively autoregulated is more fit for survival than one that does not. Uri Alon3 has proposed an excellent analogy of a negatively autoregulating transcription factor as a sports car that has a powerful engine (corresponding to the higher rate of the reaction producing Y) and sensitive brakes (corresponding to the negative autoregulation reaction slowing the production of Y).In this lesson, we have seen that particle-based simulations can be powerful for justifying why a network motif is prevalent. What are some other commonly occurring network motifs in transcription factor networks? And what evolutionary purposes might they serve? We will spend the remainder of this module delving into these questions.Next lessonCitations Dobzhansky, Theodosius (March 1973), “Nothing in Biology Makes Sense Except in the Light of Evolution”, American Biology Teacher, 35 (3): 125–129, JSTOR 4444260) ↩ Savageau, 1976 https://ucdavis.pure.elsevier.com/en/publications/biochemical-systems-analysis-a-study-of-function-and-design-in-mo ↩ Alon, Uri. An Introduction to Systems Biology: Design Principles of Biological Circuits, 2nd Edition. Chapman & Hall/CRC Mathematical and Computational Biology Series. 2019. ↩ "
} ,
{
"title" : "Transcription Factor Networks",
"category" : "",
"tags" : "",
"url" : "/motifs/networks",
"date" : "",
"content" : "Transcription factor networksOnce we know which genes each transcription factor regulates, we can consolidate this information into a transcription factor network. The nodes in the network represent an organism’s proteins, and we connect X to Y with an edge if X is a transcription factor that regulates the expression of protein Y. Any node can have an edge leading into it, but only a transcription factor can have an edge leaving it.The figure below shows a portion of the transcription factor network for Escherichia coli, the workhorse model organism of bacterial studies. Even though E. coli is a bacterium, we will be able to draw powerful conclusions about gene regulation from its transcription factor network. The true network is much larger, consisting of thousands of genes and around 300 transcription factors1, and we will need to analyze it computationally to draw these conclusions.Note that the edges in the E. coli transcription factor network below are colored red or green. An edge connecting X to Y is colored green if X upregulates Y, and it is colored red if X downregulates Y. (Alternatively, we could label the edges with a “+” or “-“.)A subset of the E. coli transcription factor network.2 An edge from X to Y denotes that X is a transcription factor that regulates Y. Edges corresponding to upregulation are colored green, and edges corresponding to downregulation are colored red. Click here to zoom in on this network.STOP: Select the expanded view of the transcription factor network in the figure above. Do you notice anything interesting about this network?AutoregulationThe E. coli transcription factor network seems to have a surprising number of loops, or edges that connect a node to itself. It is worth pausing for a moment to consider the implications of a loop in a transcription factor network. What does it even mean for a transcription factor to regulate itself?A transcription factor is a protein, which means that because of the Central Dogma of Molecular Biology, the transcription factor is produced as the result of transcription and translation of a gene appearing in an organism’s DNA. In autoregulation, illustrated in the figure below, the transcription factor protein then binds to the DNA in the upstream region of the gene encoding the same transcription factor. This type of feedback is a beautiful and surprising feature of a simple biological system.A simplified illustration of autoregulation. “Protein” labels the transcription factor binding factor protein, which binds to the DNA encoding this transcription factor, labeled by “Gene”.3Transcription factor autoregulation leads us to ask two questions. First, how can we conclude that the number of loops in a transcription factor network is “surprisingly large”? And second, if autoregulation is so common, then why would a transcription factor have evolved to regulate its own transcription?Next lesson Gene ontology database with “transcription” keyword: https://www.uniprot.org/. ↩ Samal, A. & Jain, S. The regulatory network of E. coli metabolism as a Boolean dynamical system exhibits both homeostasis and flexibility of response. BMC Systems Biology, 2, 21 (2008). https://doi.org/10.1186/1752-0509-2-21 ↩ Arani, B. M. S., Mahmoudi, M., Lahti, L., González, J., & Wit, E. C. (2018). Stability estimation of autoregulated genes under Michaelis-Menten-type kinetics. Physical Review E, 97, 62407. https://doi.org/10.1103/PhysRevE.97.062407 ↩ "
} ,
{
"title" : "Building a Biological Oscillator",
"category" : "",
"tags" : "",
"url" : "/motifs/oscillators",
"date" : "",
"content" : "Oscillators are everywhere in natureEven if placed in a bunker, humans will maintain a roughly 24-hour cycle of sleep and wakefulness1. This circadian rhythm guiding our daily schedule is not unique to animals but rather is present throughout living things, including plants and even cyanobacteria2.Life processes like the circadian rhythm that oscillate over time are not confined to circadian rhythms. You may feel like you have some control over when you go to bed, but your heart and respiratory system both follow cyclical rhythms that are subconscious. To take a much lower level example, eukaryotic cells are governed by a strict cell cycle as the cells grow and divide.We might guess from what we have learned in this module that these cyclical processes must be built upon simple rules serving as a pacemaker controlling them. However, the question remains as to what these pacemakers are and how they correctly execute oscillations over and over, throughout an organism’s life.Researchers have identified many network motifs that facilitate oscillation, some of which are very complicated and include many components. In this lesson, we will focus on a simple three-component oscillator motif called a repressilator3. In this lesson, we will implement the repressilator with a particle simulator.The repressilator: a synthetic biological oscillatorThe repressilator motif is shown in the figure below. In this motif, all three proteins are transcription factors, and they form a cycle in which X represses Y, Y represses Z, and Z represses X (hence the name). The repressilator clearly forms a feedback loop, but nothing a priori about this motif would indicate that it would lead to oscillation; after all, we have already seen feedback processes in this module that did not lead to oscillation.The repressilator motif for three particles X, Y, and Z. X represses Y, which represses Z, which in turn represses X, forming a feedback loop.STOP: Try building a reaction-diffusion model for the repressilator, assuming that we start with an initial concentration of X and no Y or Z particles.To build a reaction-diffusion model accompanying the repressilator, we start with a quantity of X particles, and no Y or Z particles. We assume that all three particles diffuse at the same rate and degrade at the same rate.Furthermore, we assume that all three particles are produced as the result of an activation process by some other transcription factor(s), which we assume happens at the same rate. We will use a hidden particle I that serves to activate the three visible particles via the three reactions I → I + X, I → I + Y, and I → I + Z, all taking place at the same rate.In the previous lesson on the feed-forward loop, we saw that we can use the reaction X + Y → X to model the repression of Y by X. To complete the repressilator model, we will add the two reactions Y + Z → Y and Z + X → Z, having the same rate as the reaction X + Y → X.If you have followed our previous tutorials, then you may feel comfortable taking off the training wheels and implementing the repressilator with your own reaction-diffusion model. We also are happy to provide the following tutorial.Visit tutorialInterpreting the repressilator’s oscillationsThe figure below shows the results of our simulation by plotting the number of X, Y, and Z particles over time. As we can see, the system shows clear oscillatory behavior, with the concentrations of X, Y, and Z taking turns being at high concentration.STOP: Why do you think that the repressilator motif leads to this pattern of oscillations?Modeling the repressilator’s concentration of particles. X is shown in yellow, Y is shown in red, and Z is shown in blue.We will attempt to provide a high-level explanation of why the repressilator produces oscillations from a simple set of rules.Because the concentration of X starts out high, with no Y or Z present, the concentration of X briefly increases because its rate of production exceeds its rate of degradation. Because there are no Y or Z particles present, there are no Y or Z to degrade, and the concentrations of these particles start increasing as well.As soon as there are some Z particles present, the reaction Z + X → Z occurs often enough for the rate of removal of X to exceed its rate of production, accounting for the first peak in the figure above.Furthermore, because the concentration of X particles begins high, the reaction X + Y → X prevents the number of Y particles from growing initially. This is because the remaining repression reaction (Y + Z → Y) has very little effect initially because the concentrations of Y and Z are both low. As a result, the rate of production of Z is higher than its rate of removal, and so its concentration increases quickly while the concentration of Y stays low.In summary, after an initial rise, the concentration of X plummets, with the concentration of Z rising up to replace it. The concentration of Y increases, but at a slower rate than that of Z. This situation is shown by the second (blue) peak in the figure above.As a result, Z and X in effect have switched roles. Because there is a high concentration of Z, the reaction Y + Z → Y will be frequent and cause the concentration of Z to decrease. Furthermore, because the concentration of X has decreased, and the concentration of Y is still relatively low, the reaction X + Y → X will occur less often, allowing the concentration of Y to continue to rise. Eventually, the decrease in Z and the increase in Y will account for the third peak (red) in the figure above.At this point, the reaction X + Y → X will suppress the concentration of Y. Because the concentration of X and Z are both lower, the reaction Z + X → Z will not greatly influence the concentration of X, which will rise to meet the following concentration of Y, and we have returned to our original situation, at which point the cycle will begin again.The power of noiseTake another look at the figure showing the oscillations of the repressilator. You will notice that the concentrations zigzag as they travel up or down, and that they peak at slightly different levels each time.This noise in the repressilator’s oscillations is due to variance as the particles travel around randomly. Specifically, the repression reactions require two particles to collide in order for the reaction to take place. Due to random chance, these collisions may occur more or less often than expected because of random chance. We should also note that some of this noise is due to low sample size: we have around 150 molecules at each peak in the above figure, but a given cell may have on the order of 1,000 to 10,000 molecules of a single protein.4Yet the noise that appears in the repressilator’s oscillations is a feature, not a bug. As we have discussed previously, the cell’s molecular interactions are inherently random. So if we see oscillations in a simulation that includes noise arising from random chance, we can be confident that this simulation is robust to a certain amount of variation.In this module’s conclusion, we will further explore the concept of robustness as it pertains to the repressilator. What happens if our simulation experiences a much greater disturbance to the concentration of one of the particles? Will it still be able to recover and return to the same oscillatory pattern?Next lesson Aschoff, J. (1965). Circadian rhythms in man. Science 148, 1427–1432. ↩ Grobbelaar N, Huang TC, Lin HY, Chow TJ. 1986. Dinitrogen-fixing endogenous rhythm in Synechococcus RF-1. FEMS Microbiol Lett 37:173–177. doi:10.1111/j.1574-6968.1986.tb01788.x.CrossRefWeb of Science. ↩ Elowitz MB, Leibler S. A synthetic oscillatory network of transcriptional regulators. Nature. 2000;403(6767):335-338. doi:10.1038/35002125 ↩ Brandon Ho, Anastasia Baryshnikova, Grant W. Brown. Unification of Protein Abundance Datasets Yields a Quantitative Saccharomyces cerevisiae Proteome. Cell Systems, 2018; DOI: 10.1016/j.cels.2017.12.004 ↩ "
} ,
{
"title" : "Transcription and DNA-Protein Binding",
"category" : "",
"tags" : "",
"url" : "/motifs/transcription",
"date" : "",
"content" : "The central dogma of molecular biologyRecall that DNA is a double-stranded molecule consisting of the four nucleobases adenine, cytosine, guanine, and thymine. A gene is a region of an organism’s DNA that is transcribed into a single-stranded RNA molecule in which thymine is converted to uracil and the other bases remain the same.The RNA transcript is then translated into an amino acid sequence. Because there are four different bases but twenty amino acids available, RNA is translated in codons, or triplets of nucleobases. The figure below shows the way in which codons are translated into amino acids, which is called the genetic code.The genetic code, which dictates the conversion of RNA codons into amino acids.DNA can be thought of as a blueprint for storing information that flows from DNA to RNA to protein. This flow of information is called the central dogma of molecular biology (see figure below).The central dogma of molecular biology states that molecular information flows from DNA in the nucleus, into the RNA that is transcribed from DNA, and then into proteins that are translated from RNA. Image courtesy: Dhorpool, Wikimedia commons user.Transcription factors control gene regulationAll of your cells have essentially the same DNA, and yet your liver cells, neurons, and brain cells are able to serve different functions. This is because the rates at which these genes are regulated, or converted into RNA and then protein, vary between genes in different tissues.Gene regulation typically occurs at either the DNA or protein level. At the DNA level, regulation is modulated by transcription factors, master regulator proteins that bind upstream of genes and serve to either activate or repress a gene’s rate of transcription. Activation will cause the gene to be “upregulated”, with increased transcription, and repression will cause the gene to be “downregulated”.Note that by the central dogma, transcription factors are involved in a sort of feedback loop. DNA is transcribed into RNA, which is translated into the protein sequence of a transcription factor, which then binds to the upstream region of some other gene and changes its rate of transcription.Transcription factors are vital for the cell’s response to its environment because extracellular stimuli can serve to activate a transcription factor via a system of signaling molecules that convey a signal through relay molecules to the transcription factor (see figure below). Only when the transcription factor is activated will it regulate its target protein(s).A cell receiving a signal which triggers a response in which this signal is “transduced” into the cell, resulting in transcription of a gene. We will discuss signal transduction in greater detail in a future module.1In a future module, we will discuss the details of how the cell detects an extracellular signal and conveys it as a response within the cell. In this module, we concern ourselves with the study of the relationship between transcription factors and the genes they regulate.Determining if a given transcription factor regulates the expression of a given geneOver the years, a number of both computational and experimental approaches have been developed to identify the collection of genes that a given transcription factor regulates.For example, genes that are regulated by the same transcription factor often share the same short region of DNA preceding the genes where the transcription factor binds. Computational biologists have developed algorithms to scan through the genome, looking for genes with similar regions preceding them, and predicting that they are regulated by the same transcription factor. If you are interested in learning more about these algorithms, we encourage you to check out Chapter 2 of Bioinformatics Algorithms: An Active Learning Approach, which can be read for free online.A widespread experimental practice for determining whether a protein bonds to a given region of DNA is called ChIP-seq2, which is short for chromatin immunoprecipitation sequencing. This approach, which is illustrated in the figure below, combines an organism’s DNA with a collection of proteins that bond to DNA (which in this case would be transcription factors). After allowing for the proteins to bond naturally to the DNA, the DNA (with proteins attached) is cleaved into much smaller fragments of a few hundred base pairs. As a result of this process, we obtain a collection of DNA fragments, some of which are attached to protein.The question is how to isolate the fragments of DNA that are bound to a single transcription factor of interest so that we can infer the fragments of DNA to which that transcription factor binds.The clever trick is to use an antibody (i.e., a protein that your immune system produces to identify foreign pathogens). The antibody is designed to identify a single protein, and it is attached to a bead. Once the antibody attaches to the protein target, a single complex is formed consisting of the DNA fragment, the transcription factor bonded to the DNA, the antibody that recognized the transcription factor, and the bead bonded to the antibody. Because of the bead, these complexes can be filtered out as “precipitate” from the solution, and we are left with just the DNA fragments that are bound to our transcription factor.In a final step, we unlink the protein from the DNA, leaving a collection of DNA fragments that were previously bonded to a single transcription factor. These fragments are read using DNA sequencing to determine the order of nucleotides on each fragment. Once we have read the fragments, we can then scan through the genome to determine the genes that these fragments precede. We can then postulate that these are the genes regulated by the transcription factor!An overview of ChIP-seq. Figure courtesy Jkwchui, Wikimedia Commons user.You may also like to check out the following excellent video on identifying genes regulated by a transcription factor. This video was produced by students in the 2020 PreCollege Program in Computational Biology at Carnegie Mellon. The presenters won an award from their peers for their work, and for good reason!STOP: How do you think that researchers measure whether a transcription factor activates or inhibits a given gene?Organizing transcription factor informationAs a result of both computational and experimental techniques, we have learned a great deal about which transcription factors regulate which genes. But what can we do with this information?We would like to organize the relationships between transcription factors and the genes they regulate in a way that will help us identify patterns in these relationships. In the next section, we will see that consolidating gene regulatory information into a network will allow us to infer how cells have evolved to quickly change the expression of their genes in response to a dynamic environment.Next lesson CC https://www.open.edu/openlearn/science-maths-technology/general-principles-cellular-communication/content-section-1 ↩ Johnson, D. S., Mortazavi, A., Myers, R. M., & Wold, B. (2007). Genome-wide mapping of in vivo protein-DNA interactions. Science, 316(5830), 1497–1502. https://doi.org/10.1126/science.1141319 ↩ "
} ,
{
"title" : "Software Tutorial: Implementing the Feed-Forward Loop Motif",
"category" : "",
"tags" : "",
"url" : "/motifs/tutorial_feed",
"date" : "",
"content" : "In this tutorial, we will use CellBlender to run a (mathematically controlled) comparison of simple regulation against regulation via the type-1 incoherent feed-forward loop that we saw in the main text.Load your CellBlender_Tutorial_Template.blend file from the Random Walk Tutorial. Save your file as ffl.blend. You may also download the completed tutorial file here.Go to CellBlender > Molecules and create the following molecules: Click the + button. Select a color (such as white). Name the molecule X1. Select the molecule type as Surface Molecule. Add a diffusion constant of 1e-6. Up the scale factor to 5 (click and type “5” or use the arrows).Repeat the above steps to make sure that the following molecules are entered with the appropriate parameters. Molecule Name Molecule Type Diffusion Constant Scale Factor X1 Surface 1e-6 5 Z1 Surface 1e-6 1 X2 Surface 1e-6 5 Y2 Surface 1e-6 1 Z2 Surface 1e-6 1 Now go to CellBlender > Molecule Placement to set the following release sites for our molecules: Click the + button. Select or type in the molecule X1. Type in the name of the Object/Region Plane. Set the Quantity to Release as 300.Repeat the above steps to make sure all of the following molecule release sites are entered. Molecule Name Object/Region Quantity to Release X1 Plane 300 X2 Plane 300 Next go to CellBlender > Reactions to create the following reactions: Click the + button. Under reactants, type X1’ (note the apostrophe). Under products, type X1’ + Z1’. Set the forward rate as 4e2.Repeat the above steps for the following reactions. Reactants Products Forward Rate X1’ X1’ + Z1’ 4e2 Z1’ NULL 4e2 X2’ X2’ + Y2’ 2e2 X2’ X2’ + Z2’ 4e3 Y2’ + Z2’ Y2’ 4e2 Y2’ NULL 4e2 Z2’ NULL 4e2 Go to CellBlender > Plot Output Settings to set up a plot as follows: Click the + button. Set the molecule name as Z1. Ensure World is selected. Ensure Java Plotter is selected. Ensure One Page, Multiple Plots is selected. Ensure Molecule Colors is selected.Repeat the above steps to ensure that we plot all of the following molecules. Molecule Name Selected Region Z1 World Z2 World We are now ready to run our simulation. Go to CellBlender > Run Simulation and select the following options: Set the number of iterations to 12000. Ensure the time step is set as 1e-6. Click Export & Run.Once the simulation has run, we can visualize our data with CellBlender > Reload Visualization Data.If you like, you can watch the animation within the Blender window by clicking the play button at the bottom of the screen.Now go back to CellBlender > Plot Output Settings and scroll to the bottom to click “Plot”. This will produce a plot of the amount of Z under simple regulation compared to the amount of Z for the feed-forward loop. Is it what you expected?Save your file, and then use the link below to return to the main text, where we will interpret the outcome of our simulation.Return to main text"
} ,
{
"title" : "Software Tutorial: Hunting for Loops in Transcription Factor Networks",
"category" : "",
"tags" : "",
"url" : "/motifs/tutorial_loops",
"date" : "",
"content" : "In this tutorial, we will build a Jupyter Notebook to analyze loops in the E. coli transcription factor network, which can be downloaded here. If you would like to jump to the end of the analysis, you can download the complete Jupyter Notebook here.You will also need the following helper file:Python FileBefore running this tutorial, make sure that the following software and packages are installed. Warning: Be careful of the igraph installation and follow the website instructions carefully. When installing via pip or conda, specify “python-igraph” instead of “igraph”. Installation Link Version1 Check Install Python3 3.7 python –version Jupyter Notebook 4.4.0 jupyter –version python-igraph 0.8.0 conda list or pip list Create a blank Jupiter notebook titled loops.ipynb and start editing this file below. First, we import the transcription factor network and see how many nodes and edges there are, as well as count the number of loops.# NOTE: when installing via pip or conda, install python-igraphfrom igraph import *from network_loader import *import randomtxt_file = 'network_tf_tf_clean.txt'network, vertex_names = open_network(txt_file)# how many nodes & edgesprint("Number of nodes: ", len(network.vs))print("Number of edges: ", len(network.es))print("Number of self-loops: ", sum(Graph.is_loop(network)))If you run your notebook, you should obtain the following statistics. Number of nodes: 197 Number of edges: 477 Number of self-loops: 130We can also create a visualization of the network by adding the following line of code to our network.plot(network, vertex_label=vertex_names, vertex_label_size=8, edge_arrow_width=1, edge_arrow_size=0.5, autocurve=True)Running the notebook now produces the following network.Our plan is to compare this network against a random network. The following code will call a function from a package to generate a random network with 197 nodes and 477 edges and plot it. It uses a built in function called random.seed() that takes an integer as input and uses this function to initiate a (pseudo)random number generator that will allow us to generate a random network. There is nothing special about the input value 42 here – or is there?random.seed(42)g = Graph.Erdos_Renyi(197,m=477,directed=True, loops=True)plot(g, edge_arrow_width=1, edge_arrow_size=0.5, autocurve=True)The resulting network is shown in the figure below.The question is how many edges and self-loops this network has, which is handled by the following code.# how many nodes & edgesprint("Number of nodes: ", len(g.vs))print("Number of edges: ", len(g.es))print("Number of self-loops: ", sum(Graph.is_loop(g)))This code produces the following statistics for the random network. Number of nodes: 197 Number of edges: 477 Number of self-loops: 5The number of self-loops is significantly lower in the random network compared to the real transcription factor network.STOP: Change the input integer to random.seed to any integer you like. How does it affect the number of nodes, edges, and self-loops? Try changing the input to a few different values.Regardless of what seed value we use, we can confirm that the number of self-loops expected in a random graph is significantly lower than in the real E. coli network. Back in the main text, we will discuss this significance and then see if we can determine why autoregulation has arisen.Return to main text Other versions may be compatible with this code, but those listed are known to work for this tutorial. ↩ "
} ,
{
"title" : "Software Tutorial: Comparing Simple Regulation to Negative Autoregulation",
"category" : "",
"tags" : "",
"url" : "/motifs/tutorial_nar",
"date" : "",
"content" : "Implementing simple regulation in CellBlenderIn this tutorial, we will compare simple against negative autoregulation using a particle-based simulation in CellBlender. We will start with simple regulation; first, load your CellBlender_Tutorial_Template.blend file from the Random Walk Tutorial. Save this file as NAR_comparison.blend. You may also download the completed tutorial files here.Then go to CellBlender > Molecules and create the following molecules: Click the + button. Select a color (such as yellow). Name the molecule Y1. Select the molecule type as Surface Molecule. Add a diffusion constant of 1e-6. Up the scale factor to 5 (click and type “5” or use the arrows).Repeat the above steps as needed to make sure that both of the following molecules are entered with the following parameters. Molecule Name Molecule Type Diffusion Constant Scale Factor Y1 Surface 1e-6 5 X1 Surface 1e-6 1 Now go to CellBlender > Molecule Placement to set the following sites to release our molecules: Click the + button. Select or type in the molecule X1. Type in the name of the Object/Region Plane. Set the Quantity to Release as 300.Finally, we set reactions. Go to CellBlender > Reactions and define the following reactions: Click the + button. Under reactants, type X1’ (note the apostrophe). Under products, type X1’ + Y1’. Set the forward rate as 2e2.Repeat the above steps as needed to ensure the following reactions are present. Reactants Products Forward Rate X1’ X1’ + Y1’ 4e2 Y1’ NULL 4e2 Go to CellBlender > Plot Output Settings to ensure that we will be able to plot the concentrations of our particles over time. Click the + button. Set the molecule name as Y1. Ensure World is selected. Ensure Java Plotter is selected. Ensure One Page, Multiple Plots is selected. Ensure Molecule Colors is selected.We are ready to run our simulation! Visit CellBlender > Run Simulation and select the following options: Set the number of iterations to 20000. Ensure the time step is set as 1e-6. Click Export & Run.Once the simulation has run, click CellBlender > Reload Visualization Data to visualize the outcome.You have the option of watching the animation within the Blender window by clicking the play button at the bottom of the screen.Now return to CellBlender > Plot Output Settings and scroll to the bottom to click Plot.You should be able to see Y reach a steady-state, at which the number of particles essentially levels off subject to some noise.Save your .blend file.Adding negative auto-regulation to the simulationNow that we have simulated simple regulation, we will implement negative autoregulation in CellBlender to compare how this system reaches steady state compared to the simple regulation system.Go to CellBlender > Molecules and create the following molecules: Click the + button. Select a color (such as yellow). Name the molecule Y2. Select the molecule type as Surface Molecule. Add a diffusion constant of 1e-6. Up the scale factor to 5 (click and type “5” or use the arrows).Repeat the above steps to make sure that we have all of the following molecules (X1 and Y1 are inherited from the simple regulation simulation). Molecule Name Molecule Type Diffusion Constant Scale Factor Y1 Surface 1e-6 5 X1 Surface 1e-6 1 Y2 Surface 1e-6 5 X2 Surface 1e-6 1 Now go to CellBlender > Molecule Placement to set the following molecule release sites: Click the + button. Select or type in the molecule X2. Type in the name of the Object/Region Plane. Set the Quantity to Release as 300.You should now have the following release sites. Molecule Name Object/Region Quantity to Release X1 Plane 300 X2 Plane 300 Next go to CellBlender > Reactions to create the following reactions: Click the + button. Under reactants, type X2’ (the apostrophe is important). Under products, type X2’ + Y2’. Set the forward rate as 2e2.Repeat the above steps as needed to ensure that you have the following reactions. Reactants Products Forward Rate X1’ X1’ + Y1’ 4e2 X2’ X2’ + Y2’ 4e2 Y1’ NULL 4e2 Y2’ NULL 4e2 Y2’ + Y2’ Y2’ 4e2 Go to CellBlender > Plot Output Settings to set up a plot as follows: Click the + button. Set the molecule name as Y2. Ensure World is selected. Ensure Java Plotter is selected. Ensure One Page, Multiple Plots is selected. Ensure Molecule Colors is selected.You should now have both Y1 and Y2 plotted. Molecule Name Selected Region Y1 World Y2 World We are now ready to run the simulation comparing simple regulation and negative autoregulation. To do so, go to CellBlender > Run Simulation and do the following: Set the number of iterations to 20000. Ensure the time step is set as 1e-6. Click Export & Run.Click CellBlender > Reload Visualization Data to visualize the result of the simulation.You have the option of watching the animation within the Blender window by clicking the play button at the bottom of the screen.Now return to CellBlender > Plot Output Settings and scroll to the bottom to click Plot.A plot should appear in which the plot of Y over time assuming simple regulation is shown in red, and the plot of Y if negatively autoregulated is shown in yellow.Save your file.Comparing simple regulation and negative autoregulationSTOP: Now that you have run the simulation comparing simple regulation and negative autoregulation, are the plots of Y for the two simulations what you would expect? Why or why not?If you find the outcome of the simulation in this tutorial confusion, don’t be concerned. In the main text, we will interpret this outcome and see if it allows us to start making conclusions about why negative autoregulation has evolved, or if we will need to further tweak our model.Return to main text"
} ,
{
"title" : "Software Tutorial: Ensuring a mathematically controlled simulation for comparing simple regulation to negative autoregulation",
"category" : "",
"tags" : "",
"url" : "/motifs/tutorial_nar_mathematically_controlled",
"date" : "",
"content" : "In this tutorial, we will use CellBlender to adapt our simulation from the tutorial on negative autoregulation into a mathematically controlled simulation.First, open the file NAR_comparison.blend from the negative autoregulation tutorial and save a copy of the file as NAR_comparison_equal.blend. You may also download the completed tutorial files here.Now go to CellBlender > Reactions to scale up the simple regulation reaction in the negative autoregulation simulation as follows: for the reaction X2’ -> X2’ + Y2’, change the forward rate from 4e2 to 4e3.Next go to CellBlender > Run Simulation and ensure that the following options are selected: Set the number of iterations to 20000. Ensure the time step is set as 1e-6. Click Export & Run.Click CellBlender > Reload Visualization Data. You have the option of watching the animation within the Blender window by clicking the play button at the bottom of the screen.Now go back to CellBlender > Plot Output Settings and scroll to the bottom to click Plot; this should produce a plot. How does your plotSave your file before returning to the main text, where we will interpret the plot produced to see if we were able to obtain a mathematically controlled simulation and then interpret the result of this simulation from an evolutionary perspective.Return to main text"
} ,
{
"title" : "Software Tutorial: Implementing the Repressilator",
"category" : "",
"tags" : "",
"url" : "/motifs/tutorial_oscillators",
"date" : "",
"content" : "In this tutorial, we will use CellBlender to build a particle-based simulation implementing a repressilator. First, load the CellBlender_Tutorial_Template.blend file from the Random Walk Tutorial and save a copy of the file as repressilator.blend. You may also download the completed tutorial file here.Then go to CellBlender > Molecules and create the following molecules: Click the + button. Select a color (such as yellow). Name the molecule Y. Select the molecule type as Surface Molecule. Add a diffusion constant of 1e-6. Up the scale factor to 5 (click and type “5” or use the arrows).Repeat the above steps to make sure that the following molecules are all entered with the appropriate parameters. Molecule Name Molecule Type Diffusion Constant Scale Factor X Surface 4e-5 5 Y Surface 4e-5 5 Z Surface 4e-5 5 HiddenX Surface 3e-6 3 HiddenY Surface 3e-6 3 HiddenZ Surface 3e-6 3 HiddenX_off Surface 1e-6 3 HiddenY_off Surface 1e-6 3 HiddenZ_off Surface 1e-6 3 Now go to CellBlender > Molecule Placement to establish molecule release sites by following these steps: Click the + button. Select or type in the molecule X. Type in the name of the Object/Region Plane. Set the Quantity to Release as 150.Repeat the above steps to make sure the following molecules are entered with the appropriate parameters as shown below. Molecule Name Object/Region Quantity to Release X Plane 150 HiddenX Plane 100 HiddenY Plane 100 HiddenZ Plane 100 Next go to CellBlender > Reactions to create the following reactions: Click the + button. Under reactants, type HiddenX’ (note the apostrophe). Under products, type HiddenX’ + X’. Set the forward rate as 2e3.Repeat the above steps for the following reactions, ensuring that you have the appropriate parameters for each reaction.Note: Some molecules require an apostrophe or a comma. This represents the orientation of the molecule in space and is very important to the reactions! Reactants Products Forward Rate HiddenX’ HiddenX’ + X’ 2e3 HiddenY’ HiddenY’ +Y’ 2e3 HiddenZ’ HiddenZ’ + Z’ 2e3 X’ + HiddenY’ HiddenY_off’ + X, 6e2 Y’ + HiddenZ’ HiddenZ_off’ + Y, 6e2 Z’ + HiddenX’ HiddenX_off’ + Z, 6e2 HiddenX_off’ HiddenX’ 6e2 HiddenY_off’ HiddenY’ 6e2 HiddenZ_off’ HiddenZ’ 6e2 X’ NULL 6e2 Y’ NULL 6e2 Z’ NULL 6e2 X, X’ 2e2 Y, Y’ 2e2 Z, Z’ 2e2 Go to CellBlender > Plot Output Settings to build a plot as follows: Click the + button. Set the molecule name as X. Ensure World is selected. Ensure Java Plotter is selected. Ensure One Page, Multiple Plots is selected. Ensure Molecule Colors is selected.Repeat the above steps for the following molecules. Molecule Name Selected Region X World Y World Z World We are now ready to run our simulation. Go to CellBlender > Run Simulation and select the following options: Set the number of iterations to 120000. Ensure the time step is set as 1e-6. Click Export & Run.Once the simulation has run, visualize the results of the simulation with CellBlender > Reload Visualization Data.Now go back to CellBlender > Plot Output Settings and scroll to the bottom to click Plot.Does the plot that you obtain look like a biological oscillator? As we return to the main text, we will interpret this plot and then see what will happen if we suddenly shift the concentration of one of the particles. Will the system still retain its oscillations?Return to main text"
} ,
{
"title" : "Software Tutorial: Perturbing the Repressilator",
"category" : "",
"tags" : "",
"url" : "/motifs/tutorial_perturb",
"date" : "",
"content" : "In this tutorial, we will see what happens when we make a sudden change to the concentration of one of the repressilator particles in the middle of the simulation. This is difficult to do with CellBlender, and so we will instead use this opportunity to transition to a “particle-free” tool called NFSim that does have the desired functionality. We will say much more about particle-free modeling, in which we do not have to track the movements of individual particles to track their concentrations, in a future module.First, you will need to install NFSim and a program called RuleBender, which we will use as a GUI for NFSim. Those two programs can be installed here. You may also download the completed tutorial file here.We will first build a simulation of the repressilator that we will perturb later. Assuming you have installed RuleBender, open the RuleBender program and select File > New BioNetGen Project.Select blank_file.bngl and name your project oscillators.Note: Occasionally the following error will pop up to inform the user: “There was a failure during the copy of the sample”. The folder will be created, but no files will be loaded. Select File > New > File to create a new blank file.Rename your file oscillator_copy.bngl and double-click the file in the navigator to open the editor window. Once in the editor window, add the following parameters:begin parameters r1 2e3 r2 6e2 r3 6e2 r4 2e2 r5 6e2end parametersNext, add the molecules used as follows:begin molecule types x(Y~U~P) y(Y~U~P) z(Y~U~P) hx() hy() hz() hx_off() hy_off() hz_off() null()end molecule typesNext, specify the quantities of each molecule at the start of the simulation:begin species x(Y~U) 150 y(Y~U) 0 z(Y~U) 0 hx() 100 hy() 100 hz() 100 hx_off() 0 hy_off() 0 hz_off() 0 null() 0end speciesTo view a plot of the molecules after the simulation is complete, add the following code:begin observables Molecules X x() Molecules Y y() Molecules Z z()end observablesThe following rules and reaction parameters are the same reaction rules as used in the CellBlender tutorial on the repressilator.begin reaction rules # x copy hx() -> hx() + x(Y~U) r1 x(Y~U) + hy() -> hy_off() + x(Y~P) r2 hy_off() -> hy() r3 x(Y~P) -> x(Y~U) r4 x() -> null() r5 # y copy hy() -> hy() + y(Y~U) r1 y(Y~U) + hz() -> hz_off() + y(Y~P) r2 hz_off() -> hz() r3 y(Y~P) -> y(Y~U) r4 y() -> null() r5 # z copy hz() -> hz() + z(Y~U) r1 z(Y~U) + hx() -> hx_off() + z(Y~P) r2 hx_off() -> hx() r3 z(Y~P) -> z(Y~U) r4 z() -> null() r5end reaction rulesFinally, specify the type of simulation and number of frames to run using the following code.# i.e. 12,000 frames at 1e-6 timestep on CellBlendersimulate_nf({t_end=>.06,n_steps=>60000});Then, save your file.On the right-hand side, click Simulation > Run to run the simulation. After the simulation is complete, a new window will appear showing the plotted graph. As we can see, this appears to be the same behavior as the CellBlender plot but with a much cleaner pattern (this is because we do not have the noise incurred by having individual particles).We will now perturb the file and test the robustness of this oscillator model.In the Navigator window, right click oscillator_copy.bngl and copy the file. Paste a copy in the same folder and rename your file to oscillator_perturb.bngl.Add the following parameters to the parameters section of the file: # delay mechanic r6 1e7 r7 4e2 r8 1e3 r9 2e4 r10 1e3Then add the following molecules to the molecules section: # delay mechanic delay() a(Y~U~P) b() null()Add the following to species: # Delay mechanic delay() 100 a(Y~U) 1000 b() 0 null() 0Optional: add the following to observables: Molecules D delay() Molecules A a() Molecules B b()Finally, add the following to reaction rules. These rules act as a delayed spike to the y() molecule. Once the delay() molecule has sufficiently decayed into null(), the a() molecule will begin producing the b() molecule, which will in turn produce the y() molecule, disrupting our initial oscillations with a large influx of y(). # delay rules delay() + a(Y~P) -> delay() + a(Y~U) r6 delay() -> null() r7 a(Y~U) -> a(Y~P) r8 a(Y~P) -> b() r9 b() -> y(Y~U) r10On the right side of the window, click Simulation > Run. After the simulation is complete, a new window will appear showing the plotted graph.Can you break the oscillator model, or is it just too robust? We recommend playing around with the reaction rules for b() – which other species could it produce? You could also adjust the starting quantities for a(Y~U~P) or change the rate at which the delay() molecule decays.In the main text, we will discuss the robustness of the repressilator and make a larger point about robustness in biology before we complete our work in this module.Return to main text"
} ,
{
"title" : "Searching for Local Differences in the SARS-CoV and SARS-CoV-2 Spike Proteins",
"category" : "",
"tags" : "",
"url" : "/coronavirus/multiseq",
"date" : "",
"content" : "In part 1 of this module, we used a variety of existing software resources to predict the structure of the SARS-CoV-2 spike protein from its amino acid sequence. We then discussed how to compare our predicted structures against the experimentally confirmed structure of the protein.Now begins part 2, in which we ask a simple question: how does the structure of the SARS-CoV-2 spike protein compare against the SARS-CoV spike protein? More importantly, in keeping with the biological maxim that the structure of a protein informs the function of that protein, can we find any clues lurking in the spike proteins’ structure that would indicate why the two viruses behave differently in humans? Why did SARS-CoV fizzle out while SARS-CoV-2 was infectious enough to cause a pandemic?Focusing on a variable region of interest in the spike proteinWe already know from our work in part 1 that when we compare the SARS-CoV and SARS-CoV-2 genomes, the spike protein is much more variable than other regions. We even see variable and conserved regions within the spike protein, as the following figure (reproduced from the section on homology modeling) indicates.Variable and conserved regions in the SARS-CoV and SARS-CoV-2 spike proteins. The S1 domain tends to be more variable, while the S2 domain is more conserved (and even has a small region of 100% similarity). Source: Jaimes et al. 20201.The most variable region between the two viruses in the spike protein is the receptor binding motif (RBM), part of the receptor binding domain (RBD) whose structure we predicted using GalaxyWEB in the homology modeling tutorial. The RBM is the component of the RBD that mediates contact with ACE2, as the following simplified animation of the process illustrates.Given that the RBM is so critical to the virus’s ability to bond to the target human enzyme, the fact that it has mutated so much from SARS-CoV to SARS-CoV-2 makes it a fascinating region of study. Do the mutations that SARS-CoV-2 has accumulated make it easier for the virus to bond to human cells? Could this be why SARS-CoV-2 is more infectious than SARS-CoV?As we hone in on the RBM, we provide an alignment of the 70 amino acid long RBM region from SARS-CoV and SARS-CoV-2 (as well as two animal viruses) in the figure below.A multiple alignment of the RBM (colored amino acids) across the human SARS-CoV virus (first row), a version of the virus isolated in a palm civet (second row), a virus isolated in a bat in 2013 (third row), and the SARS-CoV-2 virus (fourth row). Beneath each column, an asterisk denotes full conservation, a period denotes a slight mutation, and a colon indicates high variability 2.We know from our work in structure prediction that just because the sequence of a protein has been greatly mutated does not mean that the structure of that protein has changed much. Therefore, in this lesson, we will start a comparative structural analysis of the SARS-CoV and SARS-CoV-2 spike proteins to determine whether these mutations have contributed to higher infectiousness. All of this analysis will be performed using the software resources ProDy and VMD, which we briefly introduced earlier in the module.From protein structure to bound complexesNot only did researchers experimentally verify the structure of the spike protein of the two viruses, they also determined the structure of the RBD complexed with ACE2 in both SARS-CoV (PDB entry: 2ajf) and SARS-CoV-2 (PDB entry: 6vw1).The experimentally verified SARS-CoV-2 structure is a chimeric protein formed of the SARS-CoV RBD in which the RBM has the sequence from SARS-CoV-2 3. A chimeric RBD was used for complex technical reasons to ensure that the crystallization process during X-ray crystallography could be borrowed from that used for SARS-CoV.Because we know the structures of the bound complexes, we can produce 3-D visualizations of the two different complexes and look for structural differences involving the RBM. We will use VMD to produce this visualization, rotating the structures around to examine potential differences. However, we should be wary of only trusting our eyes to guide us; can we use a computational approach to tell us where to look for structural differences between the two RBMs?A first attempt at identifying local dissimilarities between protein structuresIn the previous lesson on assessing the accuracy of a predicted structure, we introduced a metric called root mean square deviation (RMSD) for quantifying the difference between two protein structures. RMSD offered an excellent method for a global comparison (i.e., a comparison across all structures), but we are interested in the local regions where the SARS-CoV and SARS-CoV-2 complexes differ. To this end, we will need an approach that examines individual amino acids in similar protein structures.STOP: How could we compare individual amino acid differences of two (similar) protein structures?Recall the following definition of RMSD for two protein structures s and t, in which each structure is represented by the positions of its n alpha carbons (s1, …, sn) and (t1, …, tn).\[\text{RMSD}(s, t) = \sqrt{\dfrac{1}{n} \cdot (d(s_1, t_1)^2 + d(s_2, t_2)^2 + \cdots + d(s_n, t_n)^2)}\]If two similar protein structures differ in a few locations, then the corresponding alpha carbon distances d(si, ti) will likely be higher at these locations. However, we will introduce a more sophisticated approach forcomparing the local structure of si against ti. To do so, we first shift gears to discuss an alternative to RMSD for computing global structure.Contact maps and QresOne of the weaknesses of RMSD that we pointed out in part 1 of this module is that a change to a single bond angle at the i-th position may cause d(sj, tj) to be nonzero when j > i, even though the structure of the protein structure downstream of the bond angle change has not changed. For example, when we discussed the Kabsch algorithm, we showed the figure below of two protein structures that are identical except for a single bond angle. All of the alpha carbon distances d(si, ti) for i at least 4 will be thrown off by this changed angle.Two toy protein structures in which the bond angle between the third and fourth alpha carbon has been changed. This change does not affect the distance between the i-th and j-th alpha carbons when i and j are both at least 4.However, note that when i and j are both at least 4, the distance d(si, sj) between the i-th and j-th alpha carbons in S will still be similar to the distance d(ti, tj) between the same alpha carbons in T. This observation leads us to a more robust approach for measuring differences in two protein structures, which compares all pairwise distances d(si, sj) in one protein structure against the corresponding distances d(ti, tj) in the other structure.To help us visualize all these pairwise distances, we will introduce the contact map of a given protein structure, which is a binary 2-D matrix indicating whether two alpha carbons are near each other. After setting a threshold distance D, and then for a given structure s, we set M(i, j) = 1 if the distance d(si, sj) is less than D, and M(i, j) = 0 if d(si, sj) is greater than or equal to D.The following figure shows the contact maps for the SARS-CoV-2 and SARS-CoV spike proteins (both full proteins and single chains) with a threshold distance D of twenty angstroms. In this map, we color contact map values black if they are equal to 1 (close amino acids) and white if they are equal to 0 (distant amino acids).Note two things in the contact maps below. First, many black values cluster around the main diagonal of the matrix, since amino acids that are near each other in the protein sequence will remain near each other in the 3-D structure. Second, the contact maps for the two proteins are very similar, driving home further the similarity of the two proteins’ structures.Note: Interested in learning how to make contact maps? We will use ProDy to do so in a later section.The contact maps of the SARS-CoV-2 spike protein (top left), SARS-CoV spike protein (top right), single chain of the SARS-CoV-2 spike protein (bottom left), and single chain of the SARS-CoV spike protein (bottom right). If the distance between the i-th and j-th amino acids in a protein structure is 20.0 Å or less, then the (i, j)-th cell of the figure is colored black. We see that SARS-CoV-2 and SARS S proteins have very similar contact maps, indicating similar structures.STOP: How do you think the contact map will change as we increase or lower the threshold distance?Consider the i-th row (or column) of a protein’s contact map, which represents all alpha carbons that are near the i-th alpha carbon. We can see how two proteins differ at the i-th position if we look at all of this row’s values. That is, if we compare all of the d(si, sj) values to all of the d(ti, tj) values.We now will use pairwise distances between alpha carbons to determine how different two proteins are at the i-th alpha carbon, using a metric called Q per residue (Qres). The formal definition of Qres for two structures s and t is as follows4:\[Q_{res}^{(i)} = \dfrac{1}{N-k} \sum^{residues}_{j\neq i-1,i,i+1} \textrm{exp}[-\dfrac{[d(s_i,s_j)-d(t_i,t_j)]^2}{2\sigma^2_{i,j}}]\]This equation includes the following parameters. N is the number of residues in each protein (this assumes that they have the same length; generalizations for proteins of non-equal length exist); k is equal to 2 when i is at either the start or the end of the protein, and k is equal to 3 otherwise; The variance term \(\sigma_{ij}^2\) is equal to \(\left\lvert{i-j}\right\rvert ^{0.15}\), which corresponds to the sequence separation between the i-th and j-th alpha carbons.Note: The above definition assumes that the two proteins have the same length or have been pre-processed by removing amino acids that only occur in one protein. Generalizations of Qres for proteins of non-equal length do exist.If two proteins are very similar at the i-th alpha carbon, then d(si, sj) - d(ti, tj) will be close to zero, and so the term inside the summation in the Qres equation will be close to 1. This summation has N - k terms, and so Qres will be close to 1. As two proteins become more different at the i-th alpha carbon, then the term inside the summation will head toward zero, and so will the Qres value as well.Therefore, Qres is a similarity metric ranging between 0 and 1, with low scores representing low similarity between two proteins at the i-th position, and higher scores representing high similarity at this position.We now turn to a tutorial that will compute Qres for the SARS-CoV and SARS-CoV-2 spike proteins. This tutorial will use the VMD plugin Multiseq, a bioinformatics analysis environment. We will use Multiseq to align the SARS-CoV-2 (chimeric) RBD and SARS RBD using the PDB entries 6vw1 and 2ajf, respectively. After determining Qres, we will visualize the individual locations where the two RBD regions differ.Visit tutorialLocal comparison of spike proteins leads us to a region of interestIn the tutorial, we formed a “structural” alignment of the two coronavirus RBD regions, in which blue columns correspond to similar areas of the structure (high Qres) and red columns correspond to dissimilar areas of the structure (low Qres).If we zoom in on the region around position 150 of the alignment, we find a 13-column region of the alignment within the RBD region for which Qres values are significantly lower than they are elsewhere. This region corresponds to positions 476 to 485 in the SARS-CoV-2 spike protein and is shown in the figure below.(Top) A snapshot of the sequence alignment between the SARS-CoV RBD (first row) and the SARS-CoV-2 chimeric RBD3 (second row). Columns are colored along a spectrum from blue (high Qres) to red (low Qres), with positions that correspond to an inserted or deleted amino acid colored red. (Bottom) Zooming in on a region of the alignment with low Qres, which corresponds to amino acids at positions 476 to 485 in the SARS-CoV-2 spike protein.We also can create a 3-D visualization of the structures. The figure below shows the superimposed structures of both the SARS and SARS-CoV-2 RBD bound with ACE2, shown in green. The same color-coding of columns of the multiple alignment in the figure above is used to highlight differences between the SARS-CoV and SARS-CoV-2 structures; that is, blue represents regions of high Qres, and red represents regions of low Qres. The low-Qres region of the RBM alignment that we highlighted in the above figure is outlined in the figure below.A visualization showing the superposed structures of SARS-CoV-2 chimeric RBD 3 and SARS RBD in blue and red based on Qres. Blue indicates high Qres and red indicates low Qres. ACE2 is shown in green. The highlighted region corresponds to the part of the RBM with a potential structural difference. Because it is adjacent to ACE2, it is likely that the structural difference here will affect ACE2 interactions.Note: Although the rest of the proteins are similar, there are other parts of the RBD at the top of the protein that show dissimilarities in the two proteins, which may be attributable to an experimental artifact. The authors of the work in which the comparison was published have pointed out that the highlighted region is unlikely to be an artifact of the experimentation because it is “buried at the RBD–ACE2 interface and did not affect crystallization”.Finding this highlighted region in the RBM where the structures of the SARS-CoV and SARS-CoV-2 spike proteins differ is an exciting development. In the next lesson, we will further explore this small region of the protein structure and see how the mutations acquired by SARS-CoV-2 may have influenced the binding affinity of the virus spike protein with the human ACE2 enzyme.Next lesson Jaimes, J. A., André, N. M., Chappie, J. S., Millet, J. K., & Whittaker, G. R. 2020. Phylogenetic Analysis and Structural Modeling of SARS-CoV-2 Spike Protein Reveals an Evolutionary Distinct and Proteolytically Sensitive Activation Loop. Journal of molecular biology, 432(10), 3309–3325. https://doi.org/10.1016/j.jmb.2020.04.009 ↩ Wan Y., Shang, J., Graham, R., Baric, R. S., Li, Fang. 2020. Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus. Journal of Virology, 94(7), e00127-20. ↩ Shang, J., Ye, G., Shi, K., Wan, Y., Luo, C., Aijara, H., Geng, Q., Auerbach, A., Li, F. 2020. Structural basis of receptor recognition by SARS-CoV-2. Nature 581, 221–224. https://doi.org/10.1038/s41586-020-2179-y ↩ ↩2 ↩3 Li, L., Sethi, A., Luthey-Schulten, Z. Evolution of Translation Class I Aminoacyl-tRNA Synthetase:tRNA complexes. University of Illinois at Urbana-Champaign, Luthey-Schulten Group, NIH Resource for Macromolecular Modeling and Bioinformatics, Computational Biophysics Workshop. https://www.ks.uiuc.edu/Training/Tutorials/TCBG-copy/evolution/evolution_tutorial.pdf ↩ "
} ,
{
"title" : "ProDy",
"category" : "",
"tags" : "",
"url" : "/coronavirus/prody",
"date" : "",
"content" : "ProDy is an open-source Python package that allows users to perform protein structural dynamics analysis. Its flexibility allows users to select specific parts or atoms of the structure for conducting normal mode analysis and structure comparison. Please be sure to have the following installed:Python (2.7, 3.5, or later)ProDyNumPyBiopythonIPythonMatplotlibGetting StartedIt is recommended that you create a workspace for storing created files when using ProDy or storing protein .pdb files. Make sure you are in your workspace before starting up IPython.ipython --pylabImport functions and turn interactive mode on (only need to do this once per session).In[#]: from pylab import *In[#]: from prody import *In[#]: ion()VMD Tutorial"
} ,
{
"title" : "A Reaction-Diffusion Model Generating Turing Patterns",
"category" : "",
"tags" : "",
"url" : "/prologue/animals",
"date" : "",
"content" : "From random walks to reaction-diffusionIn the previous section, we introduced the random walk model of a particle diffusing through a medium as a result of Brownian motion. But what exactly does the random movement of particles have to do with Alan Turing and zebras?Turing’s insight was that remarkable patterns could emerge if we combine a simulation of diffusion with a chemical reaction, in which colliding particles interact with each other. Such a model is called a reaction-diffusion system, and the patterns that emerge in the simulation are called Turing patterns in Turing’s honor.An example reaction-diffusion systemWe will consider a reaction-diffusion system having two types of particles, A and B. The system is not meant to represent a predator-prey relationship, but you may like to think of the A particles as prey and the B particles as predators for reasons that will become clear soon.Both types of particles diffuse randomly through the plane, but the A particles typically diffuse more quickly than the B particles. In the simulation that follows, we will assume that A particles diffuse twice as quickly as B particles. In terms of the random walk, this faster rate of Brownian motion means that in a single “step”, an A particle moves twice as far as a B particle.STOP: Say that we release one A particle and one B particle at the same location. If the two particles move via random walks, and the rate of diffusion of A is twice as fast, then how much farther from the origin will A be than B after n steps?We now will add some reactions to our system. The A particles are added into the system at some constant feed rate f, meaning that these particles are created as the result of other reactions that are not part of our model. In a three-dimensional system, the units of f are in mol/L/s, which means that every second, there are f moles of particles added to the system in every liter of volume. (Recall from your chemistry class long ago that one mole is 6.02214076 · 1023 particles.) As a result, the concentration of the particles increases by a constant number in each time step.There is also a kill rate constant k dictating the rate of removal of the B particles. As a result of removal, the number of B particles in the system will decrease by a factor of k in a given time step. That is, the more B particles that are present, the more B particles will be removed.Note that there is a slight difference between the feed and kill reactions. In the first reaction, the number of A particles increases by a constant number in each time step. In the second reaction, the number of B particles decreases by a constant factor multiplied by the current number of B particles. In terms of calculus, this means that if [A] and [B] denote the concentrations of the two particle types, then in the absence of other reactions, we can writed[A]/dt = fandd[B]/dt = -k · [B].Finally, our reaction-diffusion system includes the following reaction involving both particle types. The particles on the left side of this reaction are called reactants and the particles on the right side are called products.A + 2B → 3BTo simulate this reaction on a particle level, if an A particle and two B particles collide with each other, then the A particle has some fixed probability r of being replaced by a third B particle.This third reaction is why we compared A to prey and B to predators, since we may like to conceptualize the reaction as two B particles consuming an A particle and producing an offspring B particle.Parameters are omnipresent in biological modelingOur plan is to initiate the system with a uniform concentration of A particles spread across the grid and a tightly packed collection of B particles in the center of the grid. But before we do this, we first point out that the results of our simulation may vary depending upon a few things.A parameter is a variable quantity used as input to a model. Parameters are inevitable in biological modeling (and data science in general), and as we will see, changing parameters can cause major changes in the behavior of a system.Note that there are four parameters relevant to our reaction-diffusion system. Three of these parameters are the feed rate (f) of A particles, the kill rate of the B particles (k), and the rate of the predator-prey reaction (r). The final parameter of interest corresponds to the diffusion rates (i.e., speeds) of the two types of particle. We report this as a single parameter because the diffusion rates are completely dependent on each other; once the diffusion rate of the B particles is set, the diffusion rate of A particles must be twice that of the B particles.You can think of all these parameters as dials we can turn, observing how the system changes on the macro level. For example, if we raise the diffusion rate, then the particles will be moving around and bouncing into each other more, which means that we will see more of the reaction A + 2B → 3B.STOP: What will happen as we increase or decrease the feed rate f? What about the kill rate k?A reaction like A + 2B → 3B is typically thought of as occurring at a bulk reaction rate, which is the total number of reactions occurring as a function of the concentration of reactants. In the following tutorial, CellBlender uses the software MCell to simulate our reaction-diffusion model; MCell is built upon some advanced probabilistic methods that allow it to use the bulk reaction rate to determine the probability that a reaction will happen if the particles needed as reactants collide. The same goes for the feed and kill reactions; new A particles are formed, and old B particles are destroyed, via probabilities that are computed from reaction rates. For now, you can think of the rate of a reaction as directly related to its probability of occurring.When we return from this tutorial, we will examine the patterns that we are able to draw within this tutorial.Visit tutorialTuning reaction-diffusion parameters produces different Turing patternsFor some parameter values, our reaction-diffusion system is not particularly interesting. For example, the following animation is produced when using parameter rates in CellBlender of f = 1000 and k = 500,000. It shows that if the kill rate is too high, then the B particles will die out more quickly than they can be replenished by the reaction with A particles, and so only A particles will be left. In this animation, A particles have been colored green, and B particles have been colored red.On the other hand, if f is too high, then there will be an increase in the concentration of A particles. However, there will also be more interactions between A particles and pairs of B particles, and so we will see an explosion in the number of predators. The following simulation has the parameters f = 1,000,000 and k = 100,000.The interesting behavior in this system lies in a sweet spot of the parameters f and k. For example, consider the following visualization when f is equal to 100,000 and k is equal to 200,000. We see a very clear stripe of predators expanding outward against a background of prey, with subsequent stripes appearing at locations where there is a critical mass of predators to interact with each other.When we hold k fixed and increase f to 140,000, the higher feed rate increases the likelihood of B particles encountering A particles, and so we see even more waves of A cascades. Note the clear red-green stripes that have appeared by the end of the movie.As f approaches k, the stripe structure becomes chaotic and breaks down because there are so many pockets of B particles that these particles constantly collide and mix with each other. The following animation shows the result of raising f to 175,000.Once f and k are equal, the stripes will disappear. We might expect this to mean that the B particles will be uniformly distributed across a background of A particles. But what we see is that after an initial outward explosion of B particles, the system produces a mottled background, with pockets having higher or lower concentration of B. Pay attention to the following video at a point late in the animation. Although the concentrations of the particles are still changing, there is much less large-scale change than in earlier videos. If we freeze the video, our eye cannot help but see patterns of red and green clusters that resemble spots (or at the very least mottling).Turing’s patterns and Klüver’s hallucinationsWhen you look at the simulations above, an adjective that may have come to mind is “trippy”. This is no accident. Research dating all the way back to the 1920s has studied the patterns that we see during visual hallucinations, which Heinrich Klüver named form constants after studying patients who had taken mescaline.1 Form constants, which include cobwebs, tunnels, and spirals, occur across many individuals regardless of the cause of the hallucinations.Over five decades after Klüver’s work, researchers would determine that form constants having different shapes originate from simpler linear stripes of cellular activation patterns in the retina. The retina is circular, but the brain needs to convert this cellular image into a rectangular field of view; as a result, when the linear patterns are passed to the visual cortex, the hallucinating brain contorts them into the spirals and whirls that we see.2Yet this research had essentially replaced one question with another: why does hallucination cause patterns of cellular activation in the retina? This question is still unresolved, but some researchers3 believe that the linear patterns produced by hallucinations in the retina are in fact Turing patterns and can be explained by a reaction-diffusion model of firing neurons.Streamlining our simulationsDespite using advanced modeling and rendering software that has undergone years of development and optimization, each of the visualizations in this lesson took several hours to render. These simulations are computationally intensive because they require us to track the movement of tens of thousands of particles over thousands of generations.We wonder if it is possible to build a model of Turing patterns that does not require so much computational overhead. In other words, is there a simplification that we can make to our model that will run faster but still produce Turing patterns? We will turn our attention to this question in the next section.Next lesson H. Klüver. Mescal and Mechanisms of Hallucinations. University of Chicago Press, 1966. ↩ G.B. Ermentrout and J.D. Cowan. “A Mathematical Theory of Visual Hallucination Patterns”. Biol. Cybernetics 34, 137-150 (1979). ↩ J. Ouellette. “A Math Theory for Why People Hallucinate”. Quanta Magazine, July 30, 2018. https://www.quantamagazine.org/a-math-theory-for-why-people-hallucinate-20180730/ ↩ "
} ,
{
"title" : "The Gray-Scott Model: A Turing Pattern Cellular Automaton",
"category" : "",
"tags" : "",
"url" : "/prologue/blocks",
"date" : "",
"content" : "A coarse-grained model of single particle diffusionPart of the modeler’s work is not only to build models but to look for simple models of a system that capture the essence of what is being modeled and that can be run quickly and scaled to large inputs. Imagine, for example, how much computational power would be needed to build a particle-based model of your brain; the only way to study such a complicated system is by making simplifications.In our case, we have a very “fine-grained” reaction-diffusion model illustrating Turing patterns, and we will now present a faster “coarse-grained” model that will allow us to visualize Turing patterns. To do so, we will stop keeping track of individual particles and instead grid off two-dimensional space into blocks and store only the concentration of particles of the block (i.e., the number of particles in the block). To make things even simpler, we assume that there is some maximum concentration of particles possible, so that we can divide the number of particles by this maximum concentration and obtain a decimal number between 0 and 1.Let us begin with a simple example of the diffusion of only A particles (we will later add B particles as well as reactions to our model). Say that the particles are at maximum concentration in the central cell of our grid, and are present nowhere else, as the following figure illustrates.A 5 x 5 grid showing hypothetical initial concentrations of A particles. Cells are labeled by numbers between 0 and 1 representing their concentration of a single particle. In this example, the central cell has maximum concentration, and no particles are contained in any other cell.We will now update the grid of cells after one time step in a way that mimics diffusion. To do so, we will spread out the concentration of particles in each square to its eight neighbors; one way of doing so is to assume that 20% of the current cell’s concentration diffuses to each of its four adjacent neighbors, and that 5% of the cell’s concentration diffuses to its four diagonal neighbors. Because the central square in our ongoing example is the only cell with any particles, the updated concentrations of our particle after a single time step are shown in the following figure.A grid showing an update to the system in the previous figure after diffusion of particles after a single time step.After an additional time step, the particles continue to diffuse outward. For example, each diagonal neighbor of the central cell in the above figure, which has a concentration of 0.05, will lose all of its particles in the next step. This cell will also gain 20% of the particles from two of its adjacent neighbors, along with 5% of the particles from the central square (which doesn’t have any particles). This makes the updated concentration of this cell equal to 0.2(0.2) + 0.2(0.2) + 0.05(0) = 0.04 + 0.04 + 0 = 0.08.Each of the four cells adjacent to the central square will receive 20% of the particles from two of its adjacent neighbors, which have a concentration of 0.05 each. Such a cell will also receive 5% of the particles from two of its diagonal neighbors, which have a concentration of 0.2. Therefore, the updated concentration of each of these cells is 2(0.2)(0.05) + 2(0.05)(0.2) = 0.02 + 0.02 = 0.04.Finally, the central square receives 20% of the particles from each of its four adjacent neighbors, as well as 5% of the particles from each of its four diagonal neighbors. As a result, the central square’s concentration is updated to be 4(0.2)(0.2) + 4(0.05)(0.05) = 0.16 + 0.01 = 0.17.As a result, the central nine squares after two time steps are as shown in the following figure.A grid showing an update to the central nine squares of the diffusion system in the previous figure after an additional time step. The cells labeled “?” are left as an exercise for the reader.STOP: What should the values of the “?” cells be in the above figure? Note that these cells are neighbors of cells with positive concentrations after one time step, so their concentrations should be positive. Click here to see the answer.The coarse-grained model of particle diffusion that we have built is a variant of a cellular automaton, or a grid of cells in which we use fixed rules to update the status of a cell based on its current status and those of its neighbors. Cellular automata form a rich area of research applied to a wide variety of fields dating back to the middle of the 20th Century; if you are interested in learning more about them from the perspective of programming, check out the Programming for Lovers project.Slowing down the rate of diffusionThere is just one problem. Our cellular automaton model of diffusion is too volatile! In a true diffusion process, all of the particles would not rush out of the central square in a single time step.Our solution is to add a parameter dA representing the rate of diffusion of A. Instead of moving a cell’s entire concentration of particles to its neighbors in a single time step, we move only the fraction dA of them.To revisit our original example, say that dA is equal to 0.2. After the first time step, only 20% of the central cell’s particles will be spread to its neighbors. The figure below illustrates that the central square is updated to 0.8, its adjacent neighbors are updated to 0.2dA = 0.04, and its diagonal neighbors are updated to 0.05dA = 0.01.An updated grid of cells showing the concentration of A particles after one time step if dA = 0.2.Adding a second particle to our diffusion simulationWe now will add particle B to the simulation, which also starts with 100% concentration in the central square. Recall that B, our “predator” molecule, diffuses half as fast as A, the “prey” molecule. If we set the diffusion rate dB equal to 0.1, then our cells after a time step will be updated as shown in the figure below. This figure represents the concentration of the two particles in each cell as an ordered pair ([A], [B]).A figure showing cellular concentrations after one time step for two particles A and B diffusing at rates dA = 0.2 and dB = 0.1. Each cell is labeled by the ordered pair ([A], [B]).STOP: Update the cells in the above figure after another generation of diffusion. Use the diffusion rates dA = 0.2 and dB = 0.1.In the following tutorial, we will implement the cellular automaton using a Jupyter notebook and visualize how well this automaton mimics the diffusion of A and B particles. We will then continue on in the next section with adding reactions to our automaton model.Visit tutorialAdding reactions and completing the Gray-Scott modelNow that we have established a cellular automaton for tracking concentrations of two types of particles as they diffuse, we will add the following three reactions to complete the model. A “feed” reaction in which new A particles are fed into the system at a constant rate. A “death” reaction in which B particles are removed from the system at a rate proportional to their current concentration. A “reproduction” reaction A + 2B → 3B.STOP: How might we incorporate these reactions into our automaton?We will address these reactions one at a time. First, we have the feed reaction, which takes place at a feed rate. It is tempting to simply add some constant value f to the concentration of each cell in each time step. However, we want to avoid a situation in which the concentration of A particles is close to 1 and the feed reaction causes the concentration of A particles to exceed 1.Instead, if a given cell has current concentration [A], then we will add f(1-[A]) to the concentration of the cell. For example, if [A] is 0.01, then we will add 0.99f to the cell because the current concentration is low. If [A] is 0.8, then we will only add 0.2f to the concentration.Second, we consider the death reaction of B particles, which takes place at a kill rate. Recall from the previous lesson that the kill rate is proportional to the current concentration of B particles. As a result, if a cell has concentration [B], then for some constant k between 0 and 1, we will subtract k · [B] from the concentration of B particles.Third, we have the reproduction reaction A + 2B → 3B. The higher the concentration of A and B, the more this reaction will take place. Furthermore, because we need two B particles in order for the collision to occur, the reaction should be even more rare if we have a low concentration of B than if we have a low concentration of A. Therefore, if a given cell is represented by the concentrations ([A], [B]), then we will subtract [A] · [B]2 from the concentration of A and add [A] · [B]2 to the concentration of B in the next time step.Let us consider an example of how a single cell might update its concentration of both particle types as a result of reaction and diffusion. Say that we have the following hypothetical parameter values: dA = 0.2 dB = 0.1 f = 0.3 k = 0.4Furthermore, say that our cell has the concentrations ([A], [B]) = (0.7, 0.5). Then as a result of diffusion, the cell’s concentration of A will decrease by 0.7 · dA = 0.14, and its concentration of B will decrease by 0.5 · dB = 0.05. It will also receive particles from neighboring cells; for example, say that it receives an increase to its concentration of A by 0.08 and an increase to its concentration of B by 0.06 as the result of diffusion from neighbors.Now let us consider the three reactions. The feed reaction will cause the cell’s concentration of A to increase by (1 - [A]) · f = 0.09. The death reaction will cause its concentration of B to decrease by k · [B] = 0.2. And the reproduction reaction will mean that the concentration of A decreases by [A] · [B]2 = 0.175, with the concentration of B increasing by the same amount.As the result of all these processes, we update the concentrations of A and B to the following values ([A]’, [B]’) in the next time step.[A]’ = 0.7 - 0.14 + 0.08 + 0.09 - 0.175 = 0.555[B]’ = 0.5 - 0.05 + 0.06 - 0.2 + 0.175 = 0.485Applying these cell-based reaction-diffusion computations over all cells in parallel and over many generations forms a cellular automaton called the Gray-Scott model1. We should now feel confident expanding the Jupyter notebook from the previous diffusion tutorial to include the additional three reactions. The question is: will we still see Turing patterns?Visit tutorialReflection on the Gray-Scott modelIn contrast to using a particle simulator, our Jupyter Notebook demo probably produced an animation of Turing patterns in under a minute on your computer.To visualize the changing concentrations in each cell, we use a color map to color each cell based on its concentrations. Specifically, we plot a cell’s color based on its value of the concentration of predators divided by the sum of the concentrations of predators and prey. If a cell has a value close to zero for this ratio (meaning very few predators compared to prey), then it will be colored red, while if it has a value close to 1 (meaning many predators), then it will be colored dark blue. The Spectral color map that we use is shown in the figure below.The following animation shows an animation of the Gray-Scott model using the parameters f = 0.034 and k = 0.095.If we expand the size of the simulation and add new predator locations to the grid, then the patterns become more complex as they intersect.If we keep the feed rate constant and tweak the kill rate ever so slightly to k = 0.097, then the patterns change significantly into spots.If we make the prey a little happier as well, raising f to 0.038 and k to 0.099, then we have a different striped pattern.And if we raise f to 0.042 and k to 0.101, then again we see a spot pattern.The point that we are making here is that very slight changes in our model’s parameters can produce drastically different results in terms of the patterns that we witness. In this prologue’s conclusion, we will say more about this and connect this observation back to our original motivation of patterns on animals’ skin.Next lesson P. Gray and S.K. Scott, Autocatalytic reactions in the isothermal, continuous stirred tank reactor: isolas and other forms of multistability, Chemical Engineering Science 38 (1983) 29-43. ↩ "
} ,
{
"title" : "Conclusion: Turing Patterns are Fine-Tuned",
"category" : "",
"tags" : "",
"url" : "/prologue/conclusion",
"date" : "",
"content" : "The Turing patterns that emerged from our particle simulations are a testament to the human eye’s ability to find organization within the net behavior of tens of thousands of particles. For example, take another look at the video we produced that showed mottling in our particle simulator. Patterns are present, but they are also noisy — even in the dark red regions we will have quite a few green particles, and vice-versa. The rapid inference of large-scale patterns from small-scale visual phenomena is one of the tasks that our brains have evolved to perform well.Our reaction-diffusion system is remarkable because it is so fine-tuned, meaning that very slight changes in parameter values can lead to significant changes in the system. These changes could convert spots to stripes, or they could influence how clearly defined the boundaries of the Turing patterns are.Robert Munafo provides a great figure, reproduced below, showing how the Turing patterns produced by the Gray-Scott model change as the kill and feed rates vary.1 The kill rate increases along the x-axis, and the feed rate increases along the y-axis. Notice how quickly the patterns change! You may like to try tweaking the parameters of our own Gray-Scott simulator to see if you can reproduce these different patterns. Changing feed and kill parameters affects the Turing patterns produced in the Gray-Scott model. Later in this course, we will see an example of a biological system that is the opposite of fine-tuned. In a robust system, variation in parameters does not lead to substantive changes in the ultimate behavior of the system. Robust processes are vital for processes in which an organism needs to be resilient to small changes in its environment.It turns out that although Turing’s work offers a compelling argument for how zebras might have gotten their stripes, the exact mechanism by which these stripes form is still an unresolved question. However, the pigmentation of zebrafish does follow a Turing pattern because two types of pigment cells follow a reaction-diffusion model much like the one we presented above.2Furthermore, note the following two photos of giant pufferfish.34 These fish are genetically very similar, but their skin patterns are very different. What may seem like a drastic change in the appearance of the fish from spots to stripes is likely attributable to a small change of parameters in a fine-tuned biological system that, like all of life, is powered by randomness. Two similar pufferfish exhibiting very different skin patterns. A final noteThank you for making it this far! We hope that you are enjoying the course. You can join the next module of the course by clicking on the “next module” button below. In the meantime, we ask that you complete the course survey if you have not done so already.Next module “Reaction-Diffusion by the Gray-Scott Model: Pearson’s Parametrization” © 1996-2020 Robert P. Munafo https://mrob.com/pub/comp/xmorphia/index.html ↩ Nakamasu, A., Takahashi, G., Kanbe, A., & Kondo, S. (2009). Interactions between zebrafish pigment cells responsible for the generation of Turing patterns. Proceedings of the National Academy of Sciences of the United States of America, 106(21), 8429–8434. https://doi.org/10.1073/pnas.0808622106 ↩ NSG Coghlan, 2006 Creative Commons Attribution-Share Alike 3.0 Unported ↩ Chiswick Chap, 20 February 2012, Creative Commons Attribution-Share Alike 3.0 Unported ↩ "
} ,
{
"title" : "Exercises",
"category" : "",
"tags" : "",
"url" : "/prologue/exercises",
"date" : "",
"content" : "Exercises Make a cell update an exercise at end. Good exercise on changing the diffusion rates outside of what is specified by Gray-Scott. Good questions below. May need to be exercises. Visiting Robert P. Munafo https://mrob.com/pub/comp/xmorphia/index.html simulations and matching some more up to our Jupyter Notebook simulations The Gray-Scott reactions always seem to reach a steady state where the simulation stops moving. Are you able to find parameters or predator placement which creates an oscillatory pattern? Play with the seed_size in the Gray-Scott jupyter notebook. What does this parameter control? STOP: Is it ever possible for a square to have a concentration greater than 1? Why or why not?STOP: Note that the concentrations of all of the particles add up to 1 in each step. Do you think that this must always be true?Next module"
} ,
{
"title" : "Software Tutorial: Implementing the Gray-Scott Model for Coarse-Grained Reaction-Diffusion with Jupyter Notebook",
"category" : "",
"tags" : "",
"url" : "/prologue/gs-jupyter",
"date" : "",
"content" : "The following tutorial will use a Jupyter Notebook to implement the Gray-Scott model. It requires a familiarity with Python, and installation instructions can be found in our coarse-grained diffusion tutorial. You may also download the completed tutorial file here.Assuming you have Jupyter notebook, create a new file called gray-scott.ipynb (you may instead want to duplicate and modify your diffusion_automaton.ipynb file from the diffusion tutorial). Note: You should make sure to save this notebook on the same level as another folder named /dif_images. ImageIO will not always create this folder automatically, so you may need to create it manually.At the top of the notebook, we need the following imports and declarations along with a specification of the simulate function that will drive our Gray-Scott simulation.import matplotlib.pyplot as pltimport numpy as npimport timefrom scipy import signalimport imageio%matplotlib inline'''Simulate functionDescription: Simulate the Gray-Scott model for numIter iterations.Inputs: - numIter: number of iterations - A: prey matrix - B: predator matrix - f: feed rate - k: kill rate - dt: time constant - dA: prey diffusion constant - dB: predator diffusion constant - lapl: 3 x 3 Laplacian matrix to calculate diffusionOutputs: - A_matrices: Prey matrices over the course of the simulation - B_matrices: Predator matrices over the course of the simulation'''The Simulate function will take in the same parameters as the Diffuse function from the diffusion tutorial, but it will also take parameters f and k corresponding to the Gray-Scott feed and kill parameters, respectively. The simulation is in fact very similar to the diffusion notebook except for a very slight change that we make by adding the feed, kill, and predator-prey reactions when we update the matrices A and B containing the concentrations of the two particles over all the cells in the grid.images = []def Simulate(numIter, A, B, f, k, dt, dA, dB, lapl, plot_iter): print("Running Simulation") start = time.time() # Run the simulation for iter in range(numIter): A_new = A + (dA * signal.convolve2d(A, lapl, mode='same', boundary='fill', fillvalue=0) - (A * B * B) + (f * (1-A))) * dt B_new = B + (dB * signal.convolve2d(B, lapl, mode='same', boundary='fill', fillvalue=0) + (A * B * B) - (k * B)) * dt A = np.copy(A_new) B = np.copy(B_new) if (iter % plot_iter is 0): plt.clf() plt.imshow((B / (A+B)),cmap='Spectral') plt.axis('off') now = time.time() # print("Seconds since epoch =", now-start) # plt.show() filename = 'gs_images/gs_'+str(iter)+'.png' plt.savefig(filename) images.append(imageio.imread(filename)) return A, BThe following parameters will establish the grid size, the number of iterations we will range through, and where the predators and prey will start.# _*_*_*_*_*_*_*_*_* GRID PROPERTIES *_*_*_*_*_*_*_*_*_*grid_size = 101 # Needs to be oddnumIter = 5000;seed_size = 11 # Needs to be an odd numberA = np.ones((grid_size,grid_size))B = np.zeros((grid_size,grid_size))# Seed the predatorsB[int(grid_size/2)-int(seed_size/2):int(grid_size/2)+int(seed_size/2)+1, \int(grid_size/2)-int(seed_size/2):int(grid_size/2)+int(seed_size/2)+1] = \np.ones((seed_size,seed_size))The remaining parameters establish feed rate, kill rate, time interval, diffusion rates, the Laplacian we will use, and how often to draw a board to an image when rendering the animation.# _*_*_*_*_*_*_*_*_* SIMULATION VARIABLES *_*_*_*_*_*_*_*_*_*f = 0.055k = 0.117dt = 1.0dA = 1.0dB = 0.5lapl = np.array([[0.05, 0.2, 0.05],[0.2, -1.0, 0.2],[0.05, 0.2, 0.05]])plot_iter = 50After adding the code below to the bottom of the notebook, we are now ready to save our file and run the program to generate the animations.simulate(numIter, A, B, f, k, dt, dA, dB, lapl, plot_iter)imageio.mimsave('gs_images/gs_movie.gif', images)When you run your simulation, you should see an image analogous to the one in the diffusion simulation, but with much more complex behavior since we have added reactions to our model. Try changing the feed and kill rate very slightly (e.g., by 0.01). How does this affect the end result of your simulation? What if you keep making changes to these parameters? do slight changes in the should get images similar to the ones below.In the main text, we will discuss how much as we saw with the particle-based reaction-diffusion model, slight changes to the critical parameters in our model can produce vast differences in the beautiful patterns that emerge.Return to main text"
} ,
{
"title" : "Introduction: Life is Random",
"category" : "",
"tags" : "",
"url" : "/prologue/",
"date" : "",
"content" : "by Noah Lee, Mert Inan, and Phillip CompeauQuantum physics tells us that everything that happens in the universe ultimately depends on the interaction of tiny particles. Yet it is difficult for beings like ourselves to acknowledge this fundamental truth of the universe when our experience of existence is guided by “macro” phenomena.Although you seem like a coherent, single being, you are nothing more than a skin-covered bag of trillions of cells acting largely independently. Over half of these cells aren’t even yours! They correspond to bacteria that make up a couple of kilograms of your mass.Every memory you have ever had, however powerful, can be encoded by a particular exchange of ions across neural synapses deep within your nervous system. Even the behavior of an individual cell within you is driven almost chiefly by the action of molecules that sense their environment and cause chemical reactions within the cell to evince what we see as a cellular change.This perspective on your existence may seem desparately cold, but its purpose is to enforce the point that we are all already used to inferring high-level behavior from a symphony of much more low-level processes that are invisible to us. Yet what makes the whole affair seem even crueler is that this symphony is often based upon randomness. Not only is there no sentient being driving the molecular interactions in our cells, but these interactions rely upon interactions fueled by the random movement of particles.Throughout this course, we will attempt to make high-level inferences about biological systems by building simple models of these systems that often include randomness as a key feature of the model. We will see that even though a system is driven by randomness and simple rules does not mean that it does not have emergent behavior that is sophisticated, even beautiful.We hope that you will join us for this course, which is divided into five modules that cover different aspects of biological modeling. By clicking “next lesson” below, you can continue reading the prologue, a shorter module that serves as a warmup to this course. It starts with an innocent enough question: “how did the zebra get its stripes?”Next lesson"
} ,
{
"title" : "An Introduction to Random Walks",
"category" : "",
"tags" : "",
"url" : "/prologue/random-walk",
"date" : "",
"content" : "The wanderlust of a single particleWe have mentioned that our experience of the world is often influenced by the random interactions of objects that we cannot see. Our goal is to see how randomness can help us understand how zebras get their stripes, and to this end, we will consider a simpler phenomenon by observing the movement of a single particle taking a random walk in a two-dimensional plane. At each step, the particle moves a single unit in a randomly chosen direction.STOP: After n steps, how far do you think the particle will have traveled (as the crow flies) from its starting point?Let’s generate an animation of a particle following a random walk. The video below shows a randomly walking particle, shown in red, taking 1000 steps.The distance that the particle wanders from its starting point may surprise you. And yet the astute scientist would point out that this is just a single particle; perhaps the typical particle would be much more of a homebody.The particle’s movements are random, but the average-case behavior of the particle can be predicted, as the following theorem indicates. For mathematics lovers, we explain why this theorem is true in an optional bonus section at the bottom of this page.Random Walk Theorem: After n steps of unit length in a random walk, a particle will on average find itself a distance of approximately \(\sqrt{n}\) from its origin.From one particle to manyThe Random Walk Theorem does not say that after n steps a particle will be exactly \(\sqrt{n}\) from the origin, any more than we would expect that in flipping a coin 2,000 times the coin will come up heads exactly 1,000 times. Yet the statement about the particle’s average behavior is powerful. If we animate the action of many independent particles following random walks, then we will see that although some particles hug their starting point and some wind up far away, most particles steadily move outward as the simulation continues.If you are interested in seeing how to build this random walk simulation as an introduction to the software that we will soon be using for biological modeling, then please visit the following software tutorial. This tutorial uses CellBlender, an add-on to the popular open graphics software program Blender, that allows us to create and visualize biological models.We have designed this course so that you can appreciate the key ideas behind the biological models that we build without following software tutorials. But we also provide these tutorials so that you can explore the modeling software that we have used to generate our conclusions. If you find this software helpful, perhaps you can even use this software in your own work!Visit tutorialBrownian motion: big numbers in small spacesOur experience of the world confirms what we see in the animations produced by CellBlender. The seemingly random movements of particles suspended in a medium via Brownian motion will cause those particles to move away from their starting point, even if the concentration of these particles is uniform. We understand, for example, that an infected COVID-19 patient can infect many others in an enclosed space in a short time frame. To take a less macabre example, we also know that when a cake is baking in the oven at home, we will not need to wait long for wonderful smells to waft outward from the kitchen.Why should a scientist care about random walks? Later in this course, we will see that the random walk model is at the core of a simple but powerful approach that bacteria like E. coli use to explore their environment in the hunt for food. In the next lesson, we will see that mimicking the random movements of particles will be important for building a biological model in which we allow particles to move naturally and interact when they collide.Before continuing, we point you to a beautiful animation illustrating just how far a single randomly moving particle can travel in a relatively small amount of time. This animation, which shows a simulation of the path taken by a glucose molecule as the result of Brownian motion, starts at 6:10 of the following excellent instructional video developed by the late Joel Stiles.Next lesson(Optional) A proof of the Random Walk TheoremThe Random Walk Theorem states that the average distance that a randomly walking particle will find itself from its starting point after taking n steps of unit length is \(\sqrt{n}\). Below, we provide a justification for why this is true for interested learners who are familiar with probability.Let xi denote the random variable corresponding to the vector of the particle’s i-th step. The distance d traveled by the particle can be represented by the sum of all the xi,\[d = \mathbf{x_1} + \mathbf{x_2} + \cdots + \mathbf{x_n} \,.\]We will show that the expected value of d2 is equal to n. First note that\[d^2 = (\mathbf{x_1} + \mathbf{x_2} + \cdots + \mathbf{x_n}) \cdot (\mathbf{x_1} + \mathbf{x_2} + \cdots + \mathbf{x_n})\,.\]After expansion, we obtain\[\begin{align*}d^2 = ~ & \mathbf{x_1} \cdot (\mathbf{x_1} + \mathbf{x_2} + \cdots + \mathbf{x_n})\\+ & \mathbf{x_2} \cdot (\mathbf{x_1} + \mathbf{x_2} + \cdots + \mathbf{x_n})\\+ & \cdots\\+ & \mathbf{x_n} \cdot (\mathbf{x_1} + \mathbf{x_2} + \cdots + \mathbf{x_n}) \,.\end{align*}\]Finally, we rearrange this equation so that the terms \(\mathbf{x_1} \cdot \mathbf{x_1}\), \(\mathbf{x_2} \cdot \mathbf{x_2}\), and so on occur first, and the remaining terms appear last. This allows us to write d2 as follows.\[d^2 = \sum_{i=1}^n (\mathbf{x_i} \cdot \mathbf{x_i}) + \sum_{i \neq j} (\mathbf{x_i} \cdot \mathbf{x_j})\, .\]The right side of this equation is the sum of n2 dot products. When we take the expectation of both sides, we can apply a fundamental theorem called the “linearity of expectation”, which states that for any two random variables \(x\) and \(y\), the expectation of their sum \(\mathbb{E}(x + y)\) is equal to the sum of the corresponding expectations \(\mathbb{E}(x) + \mathbb{E}(y)\):\[\mathbb{E}(d^2) = \sum_{i=1}^n \mathbb{E}(\mathbf{x_i} \cdot \mathbf{x_i}) + \sum_{i \neq j} \mathbb{E}(\mathbf{x_i} \cdot \mathbf{x_j})\, .\]For any i, \(\mathbf{x_i} \cdot \mathbf{x_i}\) is just the length of the vector \(x_i\), which is equal to 1. On the other hand, the expected value of the dot product of any two random unit vectors is zero. Therefore, the right side of the above equation can be simplified to give the equation\[\mathbb{E}(d^2) = \sum_{i=1}^n 1 + \sum_{i \neq j} 0 = n + 0 = n\, ,\]which is what we set out to prove.A couple of notes before we continue. First, we did not use anything about the random walk being two-dimensional in this proof; therefore, it holds whether our particle is walking in two, three, or any number of dimensions.Second, we technically did not show that the expected value of \(d\) is \(\sqrt{n}\), but rather that the expected value of \(d^2\) is \(n\). It is not true that \(\mathbb{E}(d)\) is equal to \(\sqrt{n}\), but rather that as \(n\) grows, \(\mathbb{E}(d)\) grows like \(c \cdot \sqrt{n}\) for some constant factor \(c\). A proof is beyond the scope of this course, but it can be shown that as \(n\) goes off to infinity, \(\mathbb{E}(d)\) tends toward \(\sqrt{(2/\pi)} \cdot \sqrt{n}\). Who knew that the mathematics of random walks could be so complicated!"
} ,
{
"title" : "Alan Turing and the Zebra's Stripes",
"category" : "",
"tags" : "",
"url" : "/prologue/turing",
"date" : "",
"content" : "Turing machines and the foundations of computer scienceOur story begins with the unlikeliest of major characters: Alan Turing. If you have heard of Turing, then you might be surprised as to why he would appear in a course on biological modeling.Alan Turing in 1951. © National Portrait Gallery, London.Turing was a genius cryptographer during World War II and helped break several German ciphers. But his most famous scientific contribution was a 1936 paper in which he introduced what has come to be known as a Turing machine1. This hypothetical computer consists of an infinitely long tape of cells and a reader that can read one cell at a time. Each cell consists of only a single number, and the machine can move one cell at a time, reading and rewriting cells according to a finite collection of internal rules. Turing’s major insight was that such a machine, though simple, is enormously powerful. Nearly a century after his work, any task that a computer performs, from the device you are using to read this to the world’s most powerful supercomputer, could be implemented by a Turing machine.You may be shocked that a computer can ultimately be represented by such a simple machine, one that Joseph Weizenbaum called nothing more than “pebbles on toilet paper”2. Although they are not our focus here, if Turing machines interest you, then we include an excellent introductory video on Turing machines below, including a demonstration of how a Turing machine can be used to solve an example problem.Why spend time discussing Turing’s foundational work on theoretical computer science? Because this work enforces a theme of this course that we mentioned in the introduction, in that a computing machine built upon rules that are very simple can nevertheless produce emergent behavior that seems complex. We now will visit this theme in the context of biological modeling.Turing the biochemistTwo years before his untimely demise in 1954, Turing published his only paper on biochemistry, which centered on the question that we introduced in the introduction: “Why do zebras have stripes?”3Turing was not approaching this question from the perspective of why zebras have evolved to have stripes — this was unsolved in Turing’s time, and recent research has indicated that the stripes may be helpful in warding off flies.4 Rather, Turing reasoned that just as computers can be represented by a simple machine, there must be some simple set of molecular “rules” that cause the stripes to appear on a zebra’s coat.In the next two lessons, we will introduce a particle simulation model based on Turing’s ideas. We will explore how this model can be tweaked to explain not just the appearance of not just the zebra’s stripes but also the leopard’s spots.Next lesson Turing, Alan M. (1936), “On Computable Numbers, with an Application to the Entscheidungsproblem”, Proceedings of the London Mathematical Society, Ser. 2, Vol. 42: 230-265. ↩ Weizenbaum, Joseph (1976), Computer Power and Human Reason (New York: W.H. Freeman). ↩ Turing, Alan (1952). “The Chemical Basis of Morphogenesis” (PDF). Philosophical Transactions of the Royal Society of London B. 237 (641): 37–72. Bibcode:1952RSPTB.237…37T. doi:10.1098/rstb.1952.0012. JSTOR 92463. ↩ Caro, T., Izzo, A., Reiner, R. C., Walker, H., & Stankowich, T. (2014). The function of zebra stripes. Nature Communications, 5(1), 1–10. https://doi.org/10.1038/ncomms4535 ↩ "
} ,
{
"title" : "Software Tutorial: Generating Turing Patterns with a Reaction-Diffusion Simulation in CellBlender",
"category" : "",
"tags" : "",
"url" : "/prologue/turing-cellblender",
"date" : "",
"content" : "Load the CellBlender_Tutorial_Template.blend file that you generated in the Random Walk Tutorial. You may also download the complete file here. Save this file as a new file named turing_pattern.blend. The completed tutorial is also available here.We will first visit CellBlender > Molecules and create the B molecules, as shown in the screenshot below. Click the + button. Select a color (such as red). Name the molecule B. Under molecule type, select surface molecule. Add a diffusion constant of 3e-6. The diffusion constant indicates how many units to move the particle every time unit. (We will specify the time unit below at runtime.) Up the scale factor to 2 (click and type 2 or use the arrows).Then, repeat the above steps to make sure that all of the following molecules are entered. We use a molecule named Hidden to represent a “hidden” molecule that will be used to generate A molecules. Molecule Name Molecule Type Color Diffusion Constant Scale Factor B Surface Red 3e-6 3 A Surface Green 6e-6 3 Hidden Surface Blue 1e-6 0 Now visit CellBlender > Molecule Placement to set the following sites for releasing molecules of each of the three types. First, we will release the hidden molecules across the region so that any new A particles will be produced uniformly. Click the + button. Select or type in the molecule Hidden. Type in the name of the Object/Region Plane. Set the Quantity to Release as 1000.Then, repeat the above steps to release an initial quantity of A molecules as well, using the following table. Molecule Name Object/Region Quantity to Release Hidden Plane 1000 A Plane 6000 We are going to release an initial collection of B particles in a cluster in the center of the plane. To do so, we will need a very specific initial release of these particles, and so we will not be able to use the Molecule Placement tab. For this reason, we need to write a Python script to place these molecules, shown below. (Don’t worry if you are not comfortable with Python.) import cellblender as cb dm = cb.get_data_model() mcell = dm['mcell'] rels = mcell['release_sites'] rlist = rels['release_site_list'] point_list = [] for x in range(10): for y in range(10): point_list.append([x/100,y/100,0.0]) for x in range(10): for y in range(10): point_list.append([x/100 - 0.5,y/100 - 0.5,0.0]) for x in range(10): for y in range(10): point_list.append([x/100 - 0.8,y/100,0.0]) for x in range(10): for y in range(10): point_list.append([x/100 + 0.8,y/100 - 0.8,0.0]) new_rel = { 'data_model_version' : "DM_2015_11_11_1717", 'location_x' : "0", 'location_y' : "0", 'location_z' : "0", 'molecule' : "B", 'name' : "pred_rel", 'object_expr' : "arena", 'orient' : "'", 'pattern' : "", 'points_list' : point_list, 'quantity' : "400", 'quantity_type' : "NUMBER_TO_RELEASE", 'release_probability' : "1", 'shape' : "LIST", 'site_diameter' : "0.01", 'stddev' : "0" } rlist.append ( new_rel ) cb.replace_data_model ( dm )Locate the Outliner pane on the top-right of the Blender screen. On the left of the view button in the Outliner pane, there is a code tree icon.Click this icon and choose Text Editor. To create a new file for our code, click the + button. Copy and paste the code into the text editor and save it with the name pred-center.py.Next visit CellBlender > Scripting > Data-Model Scripting > Run Script, as shown in the following screenshot. Select Internal from the Data-Model Scripting menu and click the refresh button. Click the filename entry area next to File and enter pred_center.py. Click Run Script to executeYou should see that another placement site called pred_rel has appeared in the Molecule Placement tab.Next go to CellBlender > Reactions to create the reactions that will drive the system. Click the + button. Under reactants, type Hidden; (note: the semi-colon is important). Under products, type Hidden; + A; Set the forward rate as 1e5.Repeat these steps to ensure that we have all of the following reactions. Reactants Products Forward Rate Hidden; Hidden; + A; 1e5 B; NULL 1e5 B; + B; + A; B; + B; + B; 1e1 We are now ready to run our simulation. To do so, visit CellBlender > Run Simulation and select the following options: Set the number of iterations to 200. Ensure the time step is set as 1e-6. Click Export & Run.Once the run is complete, save your file.We can also now visualize our simulation. Click CellBlender > Reload Visualization Data. You have the option of watching the animation within the Blender window by clicking the play button at the bottom of the screen.If you like, you can export this animation by using the following steps: Click the movie tab. Scroll down to the file name. Select a suitable location for the file. Select the file type you would like (we suggest FFmpeg_video). Click Render > OpenGL Render Animation.The movie will begin playing, and when the animation is complete, the movie file should be in the folder location you selected.You may be wondering how the parameters in the above simulations were chosen. The fact of the matter is that for many choices of these parameters, we will obtain behavior that does not produce an animation as interesting as what we found in this tutorial. Furthermore, try making slight changes to the feed and kill rates in the CellBlender reactions (e.g., multiplying one of them by 1.25) and watching the animation. How does a small change in parameters cause the animation to change?As we return to the main text, we will discuss how the patterns that we observe change as we make slight changes to these parameters. What biological conclusion can we draw from this phenomenon?Return to main text"
} ,
{
"title" : "Software Tutorial: Building a Diffusion Cellular Automaton with Jupyter Notebook",
"category" : "",
"tags" : "",
"url" : "/prologue/tutorial-diffusion",
"date" : "",
"content" : "In this tutorial, we will use Python to build a Jupyter notebook. We suggest only following the tutorial closely if you are familiar with Python or programming in general. If you have not installed Python, then the following software and packages will need to be installed: Installation Link Version1 Check Install Python3 3.7 python –version Jupyter Notebook 4.4.0 jupyter –version matplotlib 2.2.3 conda list or pip list numpy 1.15.1 conda list or pip list scipy 1.1.0 conda list or pip list imageio 2.4.1 conda list or pip list You can read more about various installation options here or here.Once you have Jupyter Notebook installed, create a new notebook file called diffusion_automaton.ipynb.Note: You will need to save this file on the same level as another folder named /dif_images. ImageIO will not always create this folder automatically, so you may need to create it manually.You may also download the completed tutorial here.We are now ready to simulate our automaton representing the diffusion of two particle species: a prey (A) and a predator (B). Enter the following into our notebook.import matplotlib.pyplot as pltimport numpy as npimport timefrom scipy import signalimport imageio%matplotlib inlineimages = []To simulate the diffusion process, we will rely upon an imported convolution function. The convolve function will use a specified 3 x 3 laplacian matrix to simulate diffusion as discussed in the main text. Specifically, the convolve function in this case takes two matrices, mtx and lapl, and uses lapl as a set of multipliers for each square in mtx. We can see this operation in action in the image below.A single step in the convolution function which takes the first matrix and adds up each cell multiplied by the number in the second matrix. Here we see (0 * 0) + (2 * ¼) + (0 * 0) + (3 * ¼) + (1 * -1) + (2 * ¼) + (1 * 0) + (1 * ¼) +(1 * 0) = 1Because we’re trying to describe the rate of diffusion over this system, the values in the 3 x 3 laplacian excluding the center sum to 1. In our code, the value in the center is -1 because we’ve specified the change in the system with the convolution function i.e. the matrix dA, which we then add to the original matrix A. Thus the total sum of the laplacian is 0 which means the total change in number of molecules due to diffusion is 0, even if the molecules are moving to new locations. We don’t want any new molecules created due to just diffusion! (This would violate the law of conservation of mass.)We are now ready to write a Python function Diffuse that we will add to our notebook. This function will take a collection of parameters: numIter: the number of steps to run our simulation A, B: matrices containing the respective concentrations of prey and predators in each cell dt: the unit of time dA, dB: diffusion rates for prey and predators, respectively lapl: our 3 x 3 Laplacian matrix plot_iter: the number of steps to “skip” when animating our simulationdef Diffuse(numIter, A, B, dt, dA, dB, lapl, plot_iter): print("Running Simulation") start = time.time() # Run the simulation for iter in range(numIter): A_new = A + (dA * signal.convolve2d(A, lapl, mode='same', boundary='fill', fillvalue=0)) * dt B_new = B + (dB * signal.convolve2d(B, lapl, mode='same', boundary='fill', fillvalue=0)) * dt A = np.copy(A_new) B = np.copy(B_new) if (iter % plot_iter is 0): plt.clf() plt.imshow((B / (A+B)),cmap='Spectral') plt.axis('off') now = time.time() # print("Seconds since epoch =", now-start) # plt.show() filename = 'dif_images/diffusion_'+str(iter)+'.png' plt.savefig(filename) images.append(imageio.imread(filename)) return A, BThe following parameters will set up our problem space by defining the grid size, the number of iterations we will range through, and establishing the initial matrices A and B.# _*_*_*_*_*_*_*_*_* GRID PROPERTIES *_*_*_*_*_*_*_*_*_*grid_size = 101 # Needs to be oddnumIter = 10000;seed_size = 11 # Needs to be an odd numberA = np.ones((grid_size,grid_size))B = np.zeros((grid_size,grid_size))# Seed the predatorsB[int(grid_size/2)-int(seed_size/2):int(grid_size/2)+int(seed_size/2)+1, \int(grid_size/2)-int(seed_size/2):int(grid_size/2)+int(seed_size/2)+1] = \np.ones((seed_size,seed_size))The following parameters will establish the time step, the diffusion rates, and how many steps will be between frames of our animation.# _*_*_*_*_*_*_*_*_* SIMULATION VARIABLES *_*_*_*_*_*_*_*_*_*dt = 1.0dA = 0.5dB = 0.25lapl = np.array([[0.05, 0.2, 0.05],[0.2, -1.0, 0.2],[0.05, 0.2, 0.05]])plot_iter = 50Diffuse(numIter, A, B, dt, dA, dB, lapl, plot_iter)imageio.mimsave('dif_images/0diffusion_movie.gif', images)We now are ready to save and run our notebook. When you run the notebook, you should see an animation in which concentrations of predators are spreading out against a field of prey.Above, we used a parameter when plotting called Spectral that uses a color map of this name to color the images of our GIF. A color map assigns different color values to different cell values, where we are plotting a cell’s color based on its value of the concentration of predators divided by the sum of the concentrations of predators and prey. If a cell has a value close to zero for this ratio (meaning very few predators compared to prey), then it will be colored red, while if it has a value close to 1 (meaning many predators), then it will be colored dark blue. The Spectral color map is shown in the figure below.As we return to the main text, we will discuss this animation and extend our model to be able to handle reactions as well as diffusion.Return to main text Other versions may be compatible with this code, but those listed are known to work for this tutorial ↩ "
} ,
{
"title" : "Software Tutorial: Simulating particle diffusion with CellBlender",
"category" : "",
"tags" : "",
"url" : "/prologue/tutorial-random-walk",
"date" : "",
"content" : "Setting up CellBlenderSetting up CellBlender can be done using the default tutorial with just a few changes from the original installation instructions.First, follow the instructions from the CellBlender website, with the following disclaimer.Note: CellBlender requires a previous version of Blender. Use the following two changes to the default installation instructions.First, instead of downloading the newest version of Blender, go to the previous versions tab……and download Blender 2.79b.Second, once Blender is downloaded, the file path may be different from what is shown in the CellBlender tutorial- instead of Blender/2.79/python/bin, the pathway may be something like: blender-2.79/2.79/python/bin. Changing the name of the downloaded Blender folder to match the path in the tutorial may help reduce confusion.Setting up CellBlender simulationsFrom a new Blender file, initialize CellBlender. Delete the existing default cube by right-clicking on the cube to select the cube (an orange outline should be around the cube when it is selected) and pressing the “x” key to delete. Then, in the tab CellBlender > Model Objects, insert a new plane, following the figure below.In CellBlender > Model Objects, click the + symbol to center the cursor. Next press the square “plane” button to create the object. To have CellBlender recognize this object as a model object, press the + button. The name of this object is Plane by default, although you can change this name and edit the color by selecting the color wheel if you like. A slightly transparent coloring will help with visibility but is not necessary.Resizing the render preview window so that objects are visible in the center of the screen is recommended. See the following figure for instructions. Then save your file as CellBlender_Tutorial_Template.blend.From the View menu, select Top to align the view directly overhead. With the plane object selected, follow the arrow over to the object parameters menu (the orange cube) and scale the plane by setting the first two values to “1.5”. Then, hover the mouse over the object and either use ctrl + “+” 6 times or the scroll wheel on your mouse to zoom in.Navigating the CellBlender windowThis section will provide images and descriptions for the different components of the Blender window. When a new file has been created, the following figure shows the menu options available.A: This is the window for modules like CellBlender. To start CellBlender, you must click the CellBlender tab and then click the Initialize CellBlender button as shown in the image. This will then display the image shown as “D” in the figure below.B: There are many View tabs throughout the Blender window. Any future tutorials referring to the View tab are referencing this tab.C: This window contains options relating to a selected object.D: This is the CellBlender menu, which opens after CellBlender has been initialized, and contains sub-menus which will be noted as follows: CellBlender > Model Objects. We recommend dragging the edge of the window outward to increase visibility (see box “e” on the image above).Implementing particle diffusionIn CellBlender, load the CellBlender_Tutorial_Template.blend file from the previous section and save your file as random_walk.blend. You may also download the completed tutorial file here.Right click the plane object to ensure it is selected. Visit the object parameters menu (the orange cube) and move the plane by setting the third location value to 1.0 instead of 0.0.Then select CellBlender > Molecules and create the following molecules: Click the + button. Select a color (such as orange). Name the molecule X. Select the molecule type as Surface Molecule. Add a diffusion constant of 1e-6. Increase the scale factor to 5 (click and type 5 or use the arrows).Now visit CellBlender > Molecule Placement to set the following sites for molecules to be released: Click the + button. Select or type in the molecule X. Type in the name of the Object/Region Plane. Set the Quantity to Release as 1.Finally, we are ready to run our diffusion simulation. Visit CellBlender > Run Simulation and select the following options: Set the number of iterations to 1000. Ensure the time step is set as 1e-6. Click Export & Run.The simulation should run quickly, and we are ready to visualize the outcome of the simulation. To do so, visit CellBlender > Reload Visualization Data. You have the option of watching the animation within the Blender window by clicking the play button at the bottom of the screen, as indicated in the figure below. Then, save your file.You can also save and export the movie of your animation using the following steps: Click the movie tab. Scroll down to the file name. Select a suitable location for your file. Select your favorite file format (we suggest FFmpeg_video). Click Render > OpenGL Render Animation.The movie will begin playing, and when the animation is complete, the movie file should be found in the folder location you selected.Now that we have run and visualized our diffusion, we will head back to the main text, where we will continue on with our discussion of how the diffusion of particles can help us find Turing patterns.Return to main text"
} ,
{
} ,
{
"title" : "Segmenting White Blood Cells",
"category" : "",
"tags" : "",
"url" : "/white_blood_cells/segmentation",
"date" : "",
"content" : "Image segmentation and the RGB color modelWe begin our work by programming a computer to “see” a stained WBC nucleus within a larger image containing RBCs, like the one in the figure below, reproduced from the introduction. The more general problem of identifying objects within an image is called segmentation.The granulocyte presented in the introduction (having ID 3 in our dataset).Many different approaches for image segmentation have been developed, but no one has yet developed a single algorithm that could be used in all contexts. We therefore will apply a maxim that is more general than its application to biological modeling and that will recur throughout this module, which is to identify the key features special to this dataset, and then convert these features into instructions that a computer can follow.In particular, we ask ourselves what makes the WBC nucleus so easy for a human to spot in the blood cell images. You may be screaming already, “It is dark purple!” And this is a very good idea. But to train a computer to segment images by color, we need first to understand how the computer represents color in images.In the RGB color model, every rectangular pixel on a computer screen receives a solid color which is formed as a mixture of the three primary colors of light: red, green, and blue (hence the acronym “RGB”). The amount of each color in a pixel is expressed as an integer between 0 and 255, respectively, where larger integers correspond to larger amounts of the color. Some simple colors are shown in the figure below along with their RGB equivalents; for example, magenta corresponds to equal parts red and blue. Note that a color like (128, 0, 0) contains only red but appears duskier than (256, 0, 0) because the red has not been “turned on” fully.A collection of colors along with their RGB codes. Note that this table corresponds to mixing colors of light instead of pigment, which causes some strange effects; for example, yellow is formed by mixing equal parts red and green, and cyan is formed by mixing equal parts blue and green. The last six colors appear muted because they only receive half of a given color value compared to a color that receives 256 units. If all three colors are mixed in equal proportions, then we obtain a color on the gray scale between white (maximum amounts of the colors) and black (no color). Source: Excel at Finance.This observation gives us an idea for finding a WBC nucleus. Why don’t we scan through the pixels in a blood cell image and determine the amounts of each primary color in different parts of the image? We can then “turn off” any pixels whose color codes are not similar to the pixels inside the nucleus.STOP: You can find a color picker in Utilities > Digital Color Meter (Mac OS X) or by using ShareX (Windows). Open your color picker, and hover the picker over different parts of the the granulocyte image above. What are the typical RGB values for the WBC nucleus, and how do these RGB values differ from other parts of the cell?Binarizing an image based on a color thresholdWhen using a color picker, we see that a stained WBC nucleus has more blue than the surrounding RBCs, which is unsurprising. We can then binarize our image by turning a pixel white if its blue value is above some threshold and turning a pixel black if its blue value is beneath some threshold. The result for a threshold value of 153 is shown in the figure below. We can’t clearly see the WBC nucleus in this binarized image because although the nucleus has high blue values, so does the whitish background of the image (remember that colors close to white are formed by mixing high percentages of red, green, and blue).A binarized version of our granulocyte from the introduction (having image ID 3 in our dataset). A pixel is colored black if it has a blue channel value of 153 or greater, and the pixel is colored black otherwise. The region with the nucleus is shown in white but is not clearly visible because much of the background of the image, which is very light, also has a high red value (remember that mixing all three colors in equal proportions yields white).STOP: How might we modify our segmentation approach to perform a binarization that identifies the WBC nucleus more effectively?Before we give up, let’s consider the other two primary colors. The blue channel was unable to distinguish between the image background and the WBC nucleus, but you can verify with a color picker that the green content of nuclear pixels is typically much lower than the background. The WBC nucleus also tends to have a lower red content than both the RBCs and the background. So, if we binarize the original image using a green threshold and then (separately) a red threshold, we obtain the two images in the figure below. Two more binarized versions of the neutrophil image from the figure above (left), based on the green and red values. For both of these colors, the WBC nucleus tends to have lower values than other parts of the original image. (Left) A binarization in which a pixel is turned white if it has a green value less than or equal to 153. (Right) A binarization in which a pixel is turned white if it has a red value less than or equal to 166.We have found a signal! It would seem that we should work with the image based on the red threshold, since the nucleus there is the clearest. However, each threshold was able to eliminate unnecessary parts of the image from consideration. For example, note the white blob in the top left of the binarized image based on the red channel. Although the red channel did not exclude this area, the blue channel did; this same region is black in the preceding figure.This insight gives us an idea; let’s produce a fourth image for which a pixel is white only if it is white in all three binarized images. In the following tutorial, we will build an R pipeline that does just this for all of our blood cell images to produce binarized WBC nuclei.Visit tutorialSuccessful segmentation is subject to parametersIf you followed the above tutorial, then you might be tempted to celebrate, since it seems that we have resolved our first main objective of identifying WBCs. Now that we can identify WBCs, this means that we can count the number of WBCs in a given sample without the need for any human labor.Indeed, if we segment all of the images in the dataset via the same process, then we typically obtain a nice result, as indicated in the figure below for the monocyte and lymphocyte example images presented in the introduction. Image segmentation of the monocyte (left) and lymphocyte (right) corresponding to IDs 15 and 20 in the provided dataset.Yet this is not to say that our segmentation pipeline is perfect; the figure below illustrates that we may not correctly parse out all of the nucleus. (Left) An image of a WBC (ID: 167) whose nucleus is not correctly identified during segmentation (right) using the parameters from the tutorial.STOP: Play around with the threshold parameters for red, green, and blue values from the tutorial. Can you find a better choice of parameters? How should we quantify whether one collection of parameters is better than another?We can continue to tweak our threshold parameters, but you can verify that our relatively simple segmentation program has successfully excised most of the WBC nuclei from our dataset. We now will move on to our second goal of classifying the WBC nuclei into the three main families constituting WBCs."
} ,
{
} ,
{
"title" : "Solutions",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/home_solutions",
"date" : "",
"content" : "How do E. coli respond to repellents?Exercise 1In contrast to that CheY phosphorylations decrease and tumbling becomes less frequent when the cell senses higher attractant concentrations, when the cell senses more repellents there should be more frequent tumbling. The decreased tumbling frequency should be a result of increased CheY phosphorylations. The cell should always be able to adapt to the current concentrations, therefore we also expect the CheY phosphoryaltions be restored when adpating.Exercise 2Update reaction rule for ligand-receptor binding fromBoundTP: L(t!1).T(l!1,Phos~U) -> L(t!1).T(l!1,Phos~P) k_T_phos*0.2toBoundTP: L(t!1).T(l!1,Phos~U) -> L(t!1).T(l!1,Phos~P) k_T_phos*5The complete code (you can download a completed BioNetGen file here: exercise_repel.bngl):begin modelbegin molecule types L(t) #ligand molecule T(l,Phos~U~P) #receptor complex CheY(Phos~U~P) CheZ()end molecule typesbegin parameters NaV2 6.02e8 #Unit conversion to cellular concentration M/L -> #/um^3 L0 5e3 #number of ligand molecules T0 7000 #number of receptor complexes CheY0 20000 CheZ0 6000 k_lr_bind 8.8e6/NaV2 #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociation k_T_phos 15 #receptor complex autophosphorylation k_Y_phos 3.8e6/NaV2 #receptor complex phosphorylates Y k_Y_dephos 8.6e5/NaV2 #Z dephosphoryaltes Yend parametersbegin reaction rules LR: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_dis #Free vs. ligand-bound receptor complexes autophosphorylates at different rates FreeTP: T(l,Phos~U) -> T(l,Phos~P) k_T_phos BoundTP: L(t!1).T(l!1,Phos~U) -> L(t!1).T(l!1,Phos~P) k_T_phos*5 YP: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phos YDeps: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephosend reaction rulesbegin seed species L(t) L0 T(l,Phos~U) T0*0.8 T(l,Phos~P) T0*0.2 CheY(Phos~U) CheY0*0.5 CheY(Phos~P) CheY0*0.5 CheZ() CheZ0end seed speciesbegin observables Molecules phosphorylated_CheY CheY(Phos~P) Molecules phosphorylated_CheA T(Phos~P) Molecules bound_ligand L(t!1).T(l!1)end observablesend modelgenerate_network({overwrite=>1})simulate({method=>"ssa", t_end=>3, n_steps=>100})The simulation outputs:What if there are multiple attractant sources?Exercise 1:In molecule types and observables, update L(t) and T(l,r,Meth~A~B~C,Phos~U~P) to L(t,Lig~A~B) and T(l,r,Lig~A~B,Meth~A~B~C,Phos~U~P), where A and B represent the two ligand types. Update the reaction ruleLR: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_distoL1R: L(t,Lig~A) + T(l,Lig~A) <-> L(t!1,Lig~A).T(l!1,Lig~A) k_lr_bind, k_lr_disL2R: L(t,Lig~B) + T(l,Lig~B) <-> L(t!1,Lig~B).T(l!1,Lig~B) k_lr_bind, k_lr_disAlso update the seed species by equally split the initial receptor concentrations by 2.You can download a completed BioNetGen file here: exercise_twoligand.bngl.Exercise 2:To wait for adaptation to ligand A, we could replace the forward reaction rate with this rule: rate constant = 0 unless after adapting to A. We could run the simulation without B first and observe the equilibrium methylation states, and use this for deciding whether the cell is adapted to A. (Why not equilibrium concentrations of free A?) One possible implementation is the following: replaceL1R: L(t,Lig~A) + T(l,Lig~A) <-> L(t!1,Lig~A).T(l!1,Lig~A) k_lr_bind, k_lr_disL2R: L(t,Lig~B) + T(l,Lig~B) <-> L(t!1,Lig~B).T(l!1,Lig~B) k_lr_bind, k_lr_diswithL1R: L(t,Lig~A) + T(l,Lig~A) <-> L(t!1,Lig~A).T(l!1,Lig~A) k_lr_bind, k_lr_disL2R: L(t,Lig~B) + T(l,Lig~B) <-> L(t!1,Lig~B).T(l!1,Lig~B) l2rate(), k_lr_disand l2rate() is a function defined as (remember to define it before reaction rules)begin functions l2rate() = if(high_methyl_receptor>1.2e3,k_lr_bind,0)end functionsThe complete code:begin modelbegin compartments EC 3 100 #um^3 PM 2 1 EC #um^2 CP 3 1 PM #um^3end compartmentsbegin molecule types L(t,Lig~A~B) T(l,r,Lig~A~B,Meth~A~B~C,Phos~U~P) CheY(Phos~U~P) CheZ() CheB(Phos~U~P) CheR(t)end molecule typesbegin observables Molecules bound_ligand L(t!1).T(l!1) Molecules phosphorylated_CheY CheY(Phos~P) Molecules low_methyl_receptor T(Meth~A) Molecules medium_methyl_receptor T(Meth~B) Molecules high_methyl_receptor T(Meth~C) Molecules phosphorylated_CheB CheB(Phos~P)end observablesbegin parameters NaV2 6.02e8 #Unit conversion to cellular concentration M/L -> #/um^3 miu 1e-6 L0 1e6 T0 7000 CheY0 20000 CheZ0 6000 CheR0 120 CheB0 250 k_lr_bind 8.8e6/NaV2 #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociation k_TaUnbound_phos 7.5 #receptor complex autophosphorylation k_Y_phos 3.8e6/NaV2 #receptor complex phosphorylates Y k_Y_dephos 8.6e5/NaV2 #Z dephosphoryaltes Y k_TR_bind 2e7/NaV2 #Receptor-CheR binding k_TR_dis 1 #Receptor-CheR dissociaton k_TaR_meth 0.08 #CheR methylates receptor complex k_B_phos 1e5/NaV2 #CheB phosphorylation by receptor complex k_B_dephos 0.17 #CheB autodephosphorylation k_Tb_demeth 5e4/NaV2 #CheB demethylates receptor complex k_Tc_demeth 2e4/NaV2 #CheB demethylates receptor complexend parametersbegin functions l2rate() = if(high_methyl_receptor>1.2e3,k_lr_bind,0)end functionsbegin reaction rules L1R: L(t,Lig~A) + T(l,Lig~A) <-> L(t!1,Lig~A).T(l!1,Lig~A) k_lr_bind, k_lr_dis L2R: L(t,Lig~B) + T(l,Lig~B) <-> L(t!1,Lig~B).T(l!1,Lig~B) l2rate(), k_lr_dis #L3R: L(t,Lig~T) + T(l,Lig~O) <-> L(t!1,Lig~O).T(l!1,Lig~O) l2rate(), k_lr_dis #Receptor complex (specifically CheA) autophosphorylation #Rate dependent on methylation and binding states #Also on free vs. bound with ligand TaUnboundP: T(l,Meth~A,Phos~U) -> T(l,Meth~A,Phos~P) k_TaUnbound_phos TbUnboundP: T(l,Meth~B,Phos~U) -> T(l,Meth~B,Phos~P) k_TaUnbound_phos*1.1 TcUnboundP: T(l,Meth~C,Phos~U) -> T(l,Meth~C,Phos~P) k_TaUnbound_phos*2.8 TaLigandP: L(t!1).T(l!1,Meth~A,Phos~U) -> L(t!1).T(l!1,Meth~A,Phos~P) 0 TbLigandP: L(t!1).T(l!1,Meth~B,Phos~U) -> L(t!1).T(l!1,Meth~B,Phos~P) k_TaUnbound_phos*0.8 TcLigandP: L(t!1).T(l!1,Meth~C,Phos~U) -> L(t!1).T(l!1,Meth~C,Phos~P) k_TaUnbound_phos*1.6 #CheY phosphorylation by T and dephosphorylation by CheZ YP: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phos YDep: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephos #CheR binds to and methylates receptor complex #Rate dependent on methylation states and ligand binding TRBind: T(r) + CheR(t) <-> T(r!2).CheR(t!2) k_TR_bind, k_TR_dis TaRUnboundMeth: T(r!2,l,Meth~A).CheR(t!2) -> T(r,l,Meth~B) + CheR(t) k_TaR_meth TbRUnboundMeth: T(r!2,l,Meth~B).CheR(t!2) -> T(r,l,Meth~C) + CheR(t) k_TaR_meth*0.1 TaRLigandMeth: T(r!2,l!1,Meth~A).L(t!1).CheR(t!2) -> T(r,l!1,Meth~B).L(t!1) + CheR(t) k_TaR_meth*30 TbRLigandMeth: T(r!2,l!1,Meth~B).L(t!1).CheR(t!2) -> T(r,l!1,Meth~C).L(t!1) + CheR(t) k_TaR_meth*3 #CheB is phosphorylated by receptor complex, and autodephosphorylates CheBphos: T(Phos~P) + CheB(Phos~U) -> T(Phos~U) + CheB(Phos~P) k_B_phos CheBdephos: CheB(Phos~P) -> CheB(Phos~U) k_B_dephos #CheB demethylates receptor complex #Rate dependent on methyaltion states TbDemeth: T(Meth~B) + CheB(Phos~P) -> T(Meth~A) + CheB(Phos~P) k_Tb_demeth TcDemeth: T(Meth~C) + CheB(Phos~P) -> T(Meth~B) + CheB(Phos~P) k_Tc_demeth end reaction rulesbegin seed species @EC:L(t,Lig~A) L0 @EC:L(t,Lig~B) L0 @PM:T(l,r,Lig~A,Meth~A,Phos~U) T0*0.84*0.9*0.5 @PM:T(l,r,Lig~A,Meth~B,Phos~U) T0*0.15*0.9*0.5 @PM:T(l,r,Lig~A,Meth~C,Phos~U) T0*0.01*0.9*0.5 @PM:T(l,r,Lig~A,Meth~A,Phos~P) T0*0.84*0.1*0.5 @PM:T(l,r,Lig~A,Meth~B,Phos~P) T0*0.15*0.1*0.5 @PM:T(l,r,Lig~A,Meth~C,Phos~P) T0*0.01*0.1*0.5 @PM:T(l,r,Lig~B,Meth~A,Phos~U) T0*0.84*0.9*0.5 @PM:T(l,r,Lig~B,Meth~B,Phos~U) T0*0.15*0.9*0.5 @PM:T(l,r,Lig~B,Meth~C,Phos~U) T0*0.01*0.9*0.5 @PM:T(l,r,Lig~B,Meth~A,Phos~P) T0*0.84*0.1*0.5 @PM:T(l,r,Lig~B,Meth~B,Phos~P) T0*0.15*0.1*0.5 @PM:T(l,r,Lig~B,Meth~C,Phos~P) T0*0.01*0.1*0.5 @CP:CheY(Phos~U) CheY0*0.71 @CP:CheY(Phos~P) CheY0*0.29 @CP:CheZ() CheZ0 @CP:CheB(Phos~U) CheB0*0.62 @CP:CheB(Phos~P) CheB0*0.38 @CP:CheR(t) CheR0end seed speciesend modelgenerate_network({overwrite=>1})simulate({method=>"ssa", t_end=>700, n_steps=>400})The simulation outputs:Exercise 3:Define ligand_center1 = [1500, 1500] and ligand_center2 = [-1500, 1500]. Since we are considering two gradients, we can add up the ligand concentration. We can replace our cal_concentraion(pos) function withdef calc_concentration(pos): dist1 = euclidean_distance(pos, ligand_center1) dist2 = euclidean_distance(pos, ligand_center2) exponent1 = (1 - dist1 / origin_to_center) * (center_exponent - start_exponent) + start_exponent exponent2 = (1 - dist2 / origin_to_center) * (center_exponent - start_exponent) + start_exponent return 10 ** exponent1 + 10 ** exponent2Is the actual tumbling reorientation used by E. coli smarter than our model?Now, for sampling the new direction, we need to consider the past concentration and the current concentration the bacterium experiences. Since the new direction is also dependent on the last direction, we also need to record the current directions.Therefore, for our tumble_move() function, we would consider three inputs: curr_direction, curr_conc, past_conc. If the current concentration is higher than the past concentration, we sample the turning with mean of 1.19π-0.1π=1.09π and standard deviation of 0.63π; otherwise ample the turning with mean of 1.19π and standard deviation of 0.63π. The new direction is the sum of the turning and the past direction.Add the mean and standard deviation of turning as constants.#Constants for E.coli tumblingtumble_angle_mu = 1.19tumble_angle_std = 0.63We implement the tumble_move function as the following:def tumble_move(curr_dir, curr_conc, past_conc): #Sample the new direction corrent = curr_conc > past_conc if correct: new_dir = np.random.normal(loc = tumble_angle_mu - 0.1, scale = tumble_angle_std) else: new_dir = np.random.normal(loc = tumble_angle_mu, scale = tumble_angle_std) new_dir *= np.random.choice([-1, 1]) new_dir += curr_dir new_dir = new_dir % (2 * math.pi) #keep within [0, 2pi] projection_h = math.cos(new_dir) #Horizontal displacement for next run projection_v = math.sin(new_dir) #Vertical displacement for next run tumble_time = np.random.exponential(tumble_time_mu) #Length of the tumbling return new_dir, projection_h, projection_v, tumble_timeUpdate the simulate function by replacingprojection_h, projection_v, tumble_time = tumble_move()withcurr_direction, projection_h, projection_v, tumble_time = tumble_move(curr_direction, curr_conc, past_conc)Can’t get enough BioNetGen?Exercise 1:You should know the molecules involved (molecule types), reactions and reaction rate constants (reaction rules), the initial conditions (seed species), the quantities you are interested in observing (observables), your simulation methods and time steps. Compartments and parameters should also be considered if applicable.Exercise 2:The complete code (you can download a completed BioNetGen file here: exercise_polymerization.bngl):begin modelbegin molecule types A(h,t)end molecule typesbegin reaction rules Initiation: A(h,t) + A(h,t) <-> A(h,t!1).A(h!1,t) 0.01,0.01 Polymerizationfree: A(h!+,t) + A(h,t) <-> A(h!+,t!1).A(h!1,t) 0.01,0.01 Polymerizationfree2: A(h,t) + A(h,t!+) <-> A(h,t!1).A(h!1,t!+) 0.01,0.01 Polymerizationbound: A(h!+,t) + A(h,t!+) <-> A(h!+,t!1).A(h!1,t!+) 0.01,0.01end reaction rulesbegin seed species A(h,t) 1000end seed speciesbegin observables Species A1 A==1 Species A2 A==2 Species A3 A==3 Species A5 A==5 Species A10 A==10 Species A20 A==20 Species ALong A>=30end observablesend modelsimulate({method=>"nf", t_end=>50, n_steps=>1000})The simulation outputs (note the concentrations are in log-scale):How to calculate steady state concentration in a reversible bimolecular reaction?Exercise 1:When the reaction begins, concentrations change toward the equilibrium concentrations. The system remains at the equilibrium state once reaching it.Exercise 2:Use [A], [B], [AB] to denote the equilibrium concentrations. At equilibrium concentrations, we havekbind · [A] · [B] = kdissociate · [AB].Because of conservation of mass, if the instead starts from no AB, our initial conditions will be a0 = b0 = 100, and ab0 = 0. (If we instead work from the “current” concentrations, a0 = b0 = 95, and ab0 = 5, how would you set up the calculations?)Similar as in the main text, Our original steady state equation can be modified tokbind · (a0 - [AB]) · (b0 - [AB]) = kdissociate · [AB].Solving this equation yields [AB] = 90.488.Exercise 3:If we add additional 100 A molecules to the system, more AB will be formed. If you use the equation setup in the solution above, we can simply update a0 = 200. [AB] = 99.019.If kdissociate = 9 instead of 3, less AB will be present at the equilibirum state. [AB] = 84.115.How to simulate a reaction step with the Gillespie algorithm?Exercise 1: Shorter because molecules collide to each other and react more frequently.Exercise 2:In this system, we have λ = 100. The probability that exactly 100 reaction happen in the next second is\[\mathrm{Pr}(X = 100) = \dfrac{\lambda^n e^{-\lambda}}{n!} = 0.03986\,.\]The expected wait time is 1/λ = 0.01.The probability that the first reaction occur after 0.02 second is\[\mathrm{Pr}(T > 0.02) = e^{-\lambda t} = 0.1353\,.\]Exercise 3:At the beginning of the simulation, only one type of reaction could occur: L + T → LT. The rate of reaction is kbind[L][T] = 100molecule·s-1. Therefore we have λ = 100molecule·s-1, and the expected wait time is thus 1/λ = 0.01s·molecule-1.Although the expected wait time before the first reaction is considerably shorter than 0.1s, it is still possible for the first reaction to happen after 0.1s.After the first reaction, our system has 9 L, 9 T, and 1 LT molecules. There are two possible types of reactions to occur: the forward reaction L + T → LT and the reverse reaction LT → L + T. The rate of forward reaction is kbind[L][T] = 81molecule·s-1, while the rate of reverse reaction is kdissociate[LT] = 2molecule·s-1. The total reaction rate is 83molecule·s-1 and hence the expected wait time before the next reaction is 0.012s. The probability of forward reaction is 81molecule·s-1/83molecule·s-1 = 0.976, and the probability of reverse reaction is 0.0241."
} ,
{
"title" : "Analyzing Structural Differences in the Bonding of SARS-CoV and SARS-CoV-2 with the ACE2 Enzyme",
"category" : "",
"tags" : "",
"url" : "/coronavirus/structural_diff",
"date" : "",
"content" : "Visualizing a region of structural differencesIn the previous lesson, we identified a region between residues 476 and 485 of the SARS-CoV-2 spike protein that corresponds to a structural difference between the SARS-CoV-2 and SARS-CoV RBMs. Our goal in this lesson is to determine whether the differences we have found affect binding affinity with the human ACE2 enzyme.We know from our work in this course that a tiny change can produce a big difference in the high-level behavior of a finely tuned system. It may therefore be the case that subtle changes in the ability of SARS-CoV-2 to stick to the ACE2 enzyme can change the virus’s infectiousness enough to greatly influence its spread through the human population.We will first use VMD to highlight the amino acids in the region of interest of the SARS-CoV-2 spike protein’s structure. If you are interested in doing so, please follow the tutorial below, which we will consult throughout the rest of this lesson.Visit tutorialAnalyzing three sites of conformational differencesOur region of interest is one of three sites showing significant conformational differences between the SARS-CoV-2 and SARS-CoV spike proteins that were identified by Shang et al.1. We will now discuss each of these three locations and see how they affect binding affinity between the spike protein and ACE2.Site 1: loop in ACE2-binding ridgeThe first location is our region of interest from the previous lesson and is found on a loop in a region called the ACE2 binding ridge. This region is shown in the figure below, in which SARS-CoV-2 is on top and SARS-CoV is on the bottom.Structural differences are challenging to show with a 2-D image, but if you followed the preceding tutorial, then we encourage you to use VMD to view the 3-D representation of the protein. Instructions on how to rotate a molecule and zoom in and out within VMD were given in our tutorial on finding local protein differences.STOP: See if you can identify the major structural difference between the proteins in the figure below. Hint: look at the yellow residue.A visualization of the loop in the ACE2-binding ridge that is conformationally different between SARS-CoV-2 (top) and SARS-CoV (bottom). The coronavirus RBD is shown in purple, and ACE2 is shown in green. Structural differences cause certain amino acid residues (highlighted in various colors) to behave differently between the two interactions.The most noticeable difference between SARS-CoV-2 and SARS-CoV relates to a “hydrophobic pocket” of three hydrophobic ACE2 residues at positions 82, 79, and 83 (methionine, leucine, and tyrosine). This pocket, which is colored silver in the above figure, is hidden away from the outside of the ACE2 enzyme to keep these amino acids separate from water. In SARS-CoV-2, the RBD phenylalanine residue at position 486 (yellow) inserts itself into the pocket, favorably interacting with ACE2. These interactions do not happen with SARS-CoV, and its corresponding RBD residue, a leucine at position 472 (yellow), is not inserted into the pocket 1.In what follows, we use a three-letter identifier for an amino acid followed by a number to indicate the identity of that amino acid followed by its position within the protein sequence. For example, the phenylalanine at position 486 of the SARS-CoV-2 spike protein would be called Phe486.Although the interaction with the hydrophobic pocket is the most critical difference between SARS-CoV-2 and SARS-CoV, there are two other key differences that we would highlight. First, in SARS-CoV-2, a main-chain hydrogen bond forms between Asn487 and Ala475 (shown in red in the above figure), which creates a more compact ridge conformation, pushing the loop containing Ala475 closer to ACE2. This repositioning allows for the N-terminal residue Ser19 of ACE2 (colored cyan in the above figure) to form a hydrogen bond with the main chain of Ala475. Second, Gln24 in ACE2 (colored orange in the above figure) forms a new contact with the RBM.Site 2: hotspot 31Hotspot 31 is not a failed Los Angeles nightclub but rather another site of notable conformational differences between SARS-CoV-2 and SARS-CoV, which was previously studied in SARS-CoV as early as 200823. This location earns its name because it involves a “salt bridge”, or a combination of hydrogen and ionic bonding between two amino acids, that takes place between Lys31 and Glu35. Hotspot 31 is colored red in the figure below.STOP: Again, see if you can spot the differences between SARS-CoV-2 and SARS-CoV.Visualizations of hotspot 31 in SARS-CoV-2 (top) and SARS-CoV (bottom). The RBD is shown in purple, and ACE2 is shown in green. In SARS-CoV, hotspot 31 corresponds to a salt bridge, which is broken in SARS-CoV-2 to form a new hydrogen bond.The figure above shows how the salt bridge is radically different in the two viruses. In SARS-CoV, the two residues appear to point towards each other because in the SARS-CoV RBM, Tyr442 (colored yellow in bottom figure) supports the salt bridge between Lys31 and Glu35 on ACE2. In contrast to Tyr442 in SARS-CoV, the corresponding amino acid in SARS-CoV-2 is the less bulky Leu455 (colored yellow in top figure), which provides less support to Lys31. This causes the salt bridge to break, so that Lys31 and Glu35 of ACE2 point in parallel toward the RBD residue Gln493 (colored blue). This change allows Lys31 and Glu35 to form hydrogen bonds with Gln493 in SARS-CoV-21.Site 3: hotspot 353Finally, we consider hotspot 353, which involves another salt bridge connecting Lys353 and Asp38 of ACE2. In this region, the difference between the residues is so subtle that it takes a keen eye to notice them.Visualizations of hotspot 353 in SARS-CoV-2 (top) and SARS-CoV (bottom). The RBD is shown in purple, and ACE2 is shown in green. In SARS-CoV, the RBD residue Thr487 (yellow) stabilizes the salt bridge between ACE2 residues Lys 353 and Asp38 (red). In SARS-CoV-2, the corresponding RBD residue Asn501 (yellow) provides less support, causing ACE2 residue Lys353 (red residue on the left) to be in a slightly different conformation and form a new hydrogen bond with the RBD 1.In SARS-CoV, the methyl group of Thr487 (colored yellow in bottom figure) supports the salt bridge on ACE2, and the side-chain hydroxyl group of Thr487 forms a hydrogen bond with the RBM backbone. The corresponding SARS-CoV-2 amino acid Asn501 (colored yellow in top figure) also forms a hydrogen bond with the RBM main chain. However, similar to what happened in hotspot 31, Asn501 provides less support to the salt bridge, causing Lys353 on ACE2 (colored red) to be in a different conformation. This allows Lys353 to form an extra hydrogen bond with the main chain of the SARS-CoV-2 RBM while maintaining the salt bridge with Asp38 on ACE21.You may be wondering how researchers can be so fastidious that they would notice all these subtle differences between the proteins, even if they know where to look. The fact is that they have help in their subjective descriptions of how protein structure affects binding. In the next lesson, we will discuss how to quantify the improved binding of SARS-CoV-2 to ACE2 at the three locations described above.Next lesson Shang, J., Ye, G., Shi, K., Wan, Y., Luo, C., Aijara, H., Geng, Q., Auerbach, A., Li, F. 2020. Structural basis of receptor recognition by SARS-CoV-2. Nature 581, 221–224. https://doi.org/10.1038/s41586-020-2179-y ↩ ↩2 ↩3 ↩4 ↩5 Li, F. 2008.Structural analysis of major species barriers between humans and palm civets for severe acute respiratory syndrome coronavirus infections. J. Virol. 82, 6984–6991. ↩ Wu, K., Peng, G., Wilken, M., Geraghty, R. J. & Li, F. 2012. Mechanisms of host receptor adaptation by severe acute respiratory syndrome coronavirus. J. Biol. Chem. 287, 8904–8911. ↩ "
} ,
{
"title" : "An Introduction to Protein Structure Prediction",
"category" : "",
"tags" : "",
"url" : "/coronavirus/structure_intro",
"date" : "",
"content" : "Determining protein structure is fundamental to understanding protein functionProteins are one of the most important groups of macromolecules in living organisms, contributing to essentially all functions within them. Recall that in our introduction to transcription in a previous module, we introduced the “central dogma” of molecular biology, in which DNA is transcribed into RNA, which is then translated into protein. This process is represented in the figure reproduced below.The central dogma of molecular biology states that molecular information flows from DNA in the nucleus, into the RNA that is transcribed from DNA, and then into proteins that are translated from RNA. Image courtesy: Dhorpool, Wikimedia commons user.In this earlier module, we focused on how master regulators called transcription factors could affect the rates at which a given gene could be transcribed into RNA and translated into protein. In this module, we investigate what happens after translation.Before continuing, we should be a bit more precise about what we mean by “protein”. The ribosome converts triplets of RNA nucleotides into a chain of amino acids called a polypeptide. The polypeptide will then “fold” into a three-dimensional shape; this folding happens without any outside influence as the polypeptide settles into the most stable three-dimensional structure. Even if a polypeptide chain is unfolded, it will almost always fold back into essentially the same 3-D structure in a manner of microseconds. This means that nature is applying a “magic algorithm” that produces the structure of a protein from its sequence of amino acids. But how does this algorithm work?This brings us to our first biological problem of interest: can we predict the shape of a protein from its amino acid sequence? This structure prediction problem, which we will focus on in the first part of this module, is simple to state but deceptively difficult. In fact, it has been an active area of biological research for several decades.You may be wondering why we care about protein structure. Knowing a protein’s shape is essential to starting to determine its function and how it interacts with other proteins or molecules in its environment. (There are still a few thousand human proteins whose function is unknown.) And understanding protein interactions underlies a huge amount of biological research. For example, a disease may be caused by a faulty protein, in which case researchers want to find a drug (i.e., some other chemical substance) that binds to the protein and causes some change of interest in that protein, such as inhibiting its behavior.For a more visual example of how protein shape affects protein function, consider the following video of a ribosome (which is a complex of RNA and proteins) translating a messenger RNA into protein. For translation to succeed, the ribosome needs to have a very precise shape including a “slot” that the messenger RNA strand can fit into and be read.As we have seen throughout this course, molecular interactions are ruled by probability. Any two molecules may interact, but their rate of dissociation will be much higher if they do not fit together well. Furthermore, two interacting molecules need to not only collide with each other but also have the correct orientation in order to fit together.Because structure prediction is such a fundamental problem, researchers wish to catalog the enormously varied shapes that different proteins can have. For example, the figure below shows each “protein of the month” in 2019 named by the Protein Data Bank (PDB). But the fact remains that proteins are submicroscopic; so how did researchers determine these shapes?Each “molecule of the month” in 2020 named by the PDB. Note how different the shapes are of all these proteins, which accomplish a wide variety of cellular tasks. Note that the SARS-CoV-2 spike protein was the molecule of the month in June 2020. Source: https://pdb101.rcsb.org/motm/motm-by-date.Laboratory methods for determining protein structureIn this section, we will introduce two popular laboratory methods for accurately determining protein structure. These approaches are very sophisticated, and we appeal to high-quality videos explaining them if you are interested.In X-ray crystallography (sometimes called macromolecular crystallography), researchers first crystallize many copies of a protein and then shining an intense x-ray beam at the crystal. Light hitting the protein is diffracted, creating patterns from which the position of every atom in the protein can be inferred. If you are interested in learning more about X-ray crystallography, check out the following excellent two-part video series from The Royal Institution.X-ray crystallography is over a century old, and has been the de facto approach for protein structure determination for decades. Yet a newer method is now rapidly replacing X-ray crystallography.In cryo-electron microscopy (cryo-EM), researchers preserve thousands of copies of the protein in non-crystalline ice and then examine these copies with an electron microscope. Check out the following YouTube video from the University of California San Francisco for a detailed discussion of cryo-EM.Unfortunately, laboratory approaches for structure determination are expensive. X-ray crystallography costs upward of $2,000 per protein; furthermore, crystallizing a protein is a challenging task, and each copy of the protein must line up in the same way, which does not work for very flexible proteins. As for cryo-EM, an electron microscope is a very complicated machine that costs hundreds of thousands or millions of dollars (one microscope housed at Lawrence Berkeley National Lab cost $27 million).Protein structures that have been determined experimentally are typically stored in the PDB, which we mentioned above. As of early 2020, this database contained over 160,000 proteins, most of which have been added since 2000.Before we set aside structure prediction, consider that a 2016 study estimated that humans have between 620,000 and 6.13 million protein isoforms (i.e., differently-shaped protein variants) 1. If we hope to catalog the proteins of all living things, then our work is just beginning.Another issue with laboratory methods of structure determination is that they require the ability to isolate the actual physical proteins. For example, to study bacterial proteins, we need to culture bacteria, and yet microbiologists have estimated that less than 2% of bacteria can currently be cultured in the lab.2What, then, can we do? Fortunately, although identifying protein structure is difficult, researchers have spent decades cataloging the genomes of thousands of species. Because of the central dogma of molecular biology, we know that much of this DNA winds up being translated into protein. As a result, biologists know the sequence of amino acids making up many proteins whose structures are unknown. In our case, although the SARS-CoV-2 genome had been sequenced in January 2020, the structure of its spike protein was unknown. Can we therefore use the sequence of amino acids corresponding to the SARS-CoV-2 spike protein to predict the protein’s 3-D shape? In other words, can we reverse engineer the magic algorithm that nature uses for protein folding?What makes protein structure prediction so difficult?Unfortunately, predicting protein structure from amino acid sequence is a very challenging problem. On the one hand, small perturbations in the primary structure of a protein can drastically change the protein’s shape and even render it useless. On the other, different amino acids can have similar chemical properties, and so some mutations will hardly change the shape of the protein at all. As a result, two very different amino acid sequences can fold into proteins with similar shapes and comparable function.For example, the following figure compares both the sequences and structures of hemoglobin subunit alpha from humans (Homo sapiens; PDB: 1si4 shortfin mako sharks (Isurus oxyrinchus ; PDB: 3mkb and emus (Dromaius novaehollandia; PDB: 3wtg. Hemoglobin is the oxygen-transport protein in the blood, consisting of two alpha “subunit” proteins and two beta subunit proteins that combine into a protein complex; because hemoglobin is well-studied and much shorter than the SARS-CoV-2 spike protein, we will use it as an example throughout this module. The subunit alpha proteins across the three species are markedly different in terms of primary structure, and yet their 3-D structures are essentially identical.(Top) An amino acid sequence comparison of the first 40 (out of 140) amino acids of hemoglobin subunit alpha for three species: human, mako shark, and emu. A column is colored blue if all three species have the same amino acid, white if two species have the same amino acid, and red if all amino acids are different. Sequence identity calculates the number of positions in two amino acid sequences that share the same character. (Bottom) Side by side comparisons of the 3-D structures of the three proteins. The final figure on the right superimposes the first three structures to highlight their similarities.Another reason why protein structure prediction is so difficult is because a polypeptide is very flexible, with the ability to rotate in multiple ways at each amino acid, which means that the polypeptide is able to fold into a staggering number of different shapes. A good analogy for polypeptide flexibility is the “Rubik’s Twist” puzzle, shown below, which consists of a linear chain of flexible blocks that can form a huge number of different shapes.An animation of Rubik’s twist forming into a ball and then back into a linear chain. Source: https://grabcad.com/library/rubik-s-snake-1.To explain why the protein is so flexible, we should say a bit more about the molecular structure of a polypeptide.An amino acid is formed of four parts. In the center, a carbon atom (called the alpha carbon) is connected to four different molecules: a hydrogen atom (H), a carboxyl group (–COOH), an amino group (-NH2), and a side chain (denoted “R” and often called an R group). The side chain is a molecule that differs between different amino acids and ranges in mass from a single hydrogen atom (glycine) all the way up to -C8H7N (tryptophan). The simplified structure of an amino acid is shown in the figure below.To form a polypeptide chain, consecutive amino acids are linked together during a condensation reaction in which the amino group of one amino acid is joined to the carboxyl group of another, while a water molecule (H2O) is expelled. This reaction is illustrated by the figure below.A condensation reaction joins two amino acids into a “dipeptide” by joining the amino group of one amino acid to the carboxyl group of the other. Source: https://bit.ly/3q0Ph8V.The resulting bond that is produced between the carbon atom of one amino acid’s carboxyl group and the nitrogen atom of the next amino acid’s amino group, called a peptide bond, is very strong. The peptide has very little rotation around this bond, which is almost always locked at 180°. As peptide bonds are formed between adjacent amino acids, the polypeptide chain takes shape, as shown in the figure below.A protein backbone formed of three amino acids.However, the bonds within an amino acid, joining the alpha carbon to its carboxyl group and amino group, are not as rigid. Like the Rubik’s twist, the polypeptide is free to rotate around these two bonds. This rotation produces two angles of interest, called the phi angle (φ) and psi angle (ψ) (see figure below), which are formed at the alpha carbon’s connections to its amino group and carboxyl group, respectively.A polypeptide chain of multiple amino acids with the torsion angles φ and ψ indicated. The angle ω indicates the angle of the peptide bond, which is typically 180°. Image courtesy: Adam Rędzikowski.Below is an excellent video from Jacob Elmer illustrating how changing φ and ψ at a single amino acid can drastically reorient a protein’s shape.A polypeptide with n amino acids will have n - 1 peptide bonds, meaning that its shape is influenced by n - 1 phi angles and 99 n - 1 psi angles. If each bond has k stable conformations, then there are k2n-2 total possible conformations of the polypeptide. For example, if k is 3 and n is just 100 (a short polypeptide), then the number of potential protein structures is more than the number of atoms in the universe! The ability for the protein to reliably find a single conformation using the magic algorithm despite such an enormous number of potential shapes is called Levinthal’s paradox.3Although protein structure prediction is difficult, it is not impossible; the protein folding approach that nature uses is not, after all, magic. In the next lesson, we will examine how existing software attempts to replicate nature’s magic algorithm for folding a polypeptide chain into a 3-D protein structure. We will then place ourselves in the shoes of early SARS-CoV-2 researchers working before the structure of the virus’s spike protein had been experimentally validated to see if we can predict its structure and give biologists a head start on fighting the pandemic.Next lesson Ponomarenko, E. A., Poverennaya, E. V., Ilgisonis, E. V., Pyatnitskiy, M. A., Kopylov, A. T., Zgoda, V. G., Lisitsa, A. V., & Archakov, A. I. 2016. The Size of the Human Proteome: The Width and Depth. International journal of analytical chemistry, 2016, 7436849. https://doi.org/10.1155/2016/7436849 ↩ Wade W. 2002. Unculturable bacteria–the uncharacterized organisms that cause oral infections. Journal of the Royal Society of Medicine, 95(2), 81–83. https://doi.org/10.1258/jrsm.95.2.81 ↩ Levinthal, C. 1969. How to Fold Graciously. Mossbaur Spectroscopy in Biological Systems, Proceedings of a meeting held at Allerton House, Monticello, Illinois. eds. Debrunner, P., Tsibris, J.C.M., Munck, E. University of Illinois Press Pages 22-24. ↩ "
} ,
{
"title" : "Software Tutorial: Adding Directionality to Spike Protein GNM Simulations Using ANM",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_ANM",
"date" : "",
"content" : "In this tutorial, we will use Normal Mode Wizard (NMWiz), a plugin in VMD that is designed to be a GUI for ProDy, to perform ANM analysis on the SARS-CoV-2 RBD. We will visualize the results in a cross-correlation map and square fluctuation plot and then produce ANM animations showing the predicted range of motion of the SARS-CoV-2 Spike RBD. Be sure to have installed VMD and know how to load molecules into the program. If you need a refresher, go to the VMD and Multiseq Tutorial.First, load 6vw1 into VMD by following the steps in the previous section Loading Molecules. Then, start up NMWiz by clicking Extensions > Analysis > Normal Mode Wizard.A small window will open. Select ProDy Interface.We want to focus only on the RBD of SARS-CoV-2, so we need to choose a new selection. In the ProDy Interface, change Selection to protein and chain F and click Select. Next, make sure that ANM calculation is selected for ProDy job:. Check the box for write and load cross-correlations heatmap. Finally, click Submit Job.Note: Let the program run and do not click any of the VMD windows, as this may cause the program to crash or become unresponsive. The job can take from a few seconds to a couple minutes. When the job is completed, you will see a new window NMWiz - 6vw1_anm ... and the cross=correlation heatmap appear.Now that the ANM calculations are completed, you will see the visualization displayed in VMD Main. Disable the visualization of the original visualization of 6vw1 by double-clicking on the letter ‘D’. The color red will represent that it is disabled.In OpenGL Display, you will be able to see the protein with numerous arrows that represents the calculated fluctuations.To actually see the protein move as described by the arrows, we have to create the animation. Go back to the NMWiz - 6vw1_anm... window and click Make next to Animations.VMD Main should now display a new row for the animation.The animation should also be visible in OpenGL Display. However, the previous visualizations are somewhat in the way. We can disable them in the same way as before by double-clicking on the letter ‘D’.Now, you should be able to clearly see the animation of the ANM fluctuations of 6vw1.Now let’s go back to the main text to interpret the results.Return to main text"
} ,
{
"title" : "Software Tutorial: Molecular Dynamics Analysis using DynOmics 1.0",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_DynOmics",
"date" : "",
"content" : "Molecular Dynamics Analysis using DynOmics 1.0In this tutorial, we will be using a publically available web server, DynOmics, by Dr. Hongchun Liet. al. in the Bahar Lab at the University of Pittsburgh, School of Medicine. This server is dedicated to performing molecular dynamics analysis by integrating the Gaussian Network Model (GNM) and the Anisotropic Network Model (ANM).Head over to the main page of DynOmics by following this link DynOmics 1.0. Here, we can see many options that we can change to customize our analysis. But for now, we will stick to the default options. To choose our target molecule, we need to input the PDB ID. Since we will be performing the analysis on the SARS-CoV-2 S protein, we will use the PDB ID: 6vxx. Then, click Submit.Once the analysis is complete, you will see all the ANM and GNM results listed next to an interactive visualization of the protein. In addition, the visualization is colored based on the predicted protein flexibility from the slow mode analysis.Let’s explore some of the results starting with Molecular Motions - Animations. Here we see an animated, interactive visualization of the protein with the same coloring as before. This time, we are able to see the actual predicted motion of the protein fluctuation based on ANM calculations. On the right, we can customize the animation by changing the vibrations and vectors to make the motions more pronounced. More importantly, we can change the Mode index. Recall that we have learned that the motion of protein fluctuations can be broken down into a collection of modes. By changing the Mode index, we can see the different contribution of each mode to the motion. Another neat thing that we can do is to download the calculations as a .nmd file and visualize it in VMD!If you are interested in using VMD, open the software and go to Extensions > Analysis > Normal Mode Wizard. Then, click Load NMD File and select the .nmd that you downloaded. Now that the ANM calculation is loaded into VMD, you can customize the visualization and recreate the animation by clicking Animation: Play.Next, head over to Mean-Square Fluctuations of Residues. On this page, you will see two visualizations of the protein, labelled “Theoretical B-Factors” and “Experimental B-Factors” as well as the B-factor plot. Recall that theoretical B-factors are calculated during the GNM analysis while the experimental B-factors are included in the PDB. On the bottom, we can see the plot of the B-factors across the entire protein split into chains.The next result page is Selected Modes - Color-coded Diagrams. Here, we can see the shape of each individual slow mode or an average of slowest 1-2, 1-3, or 1-10. Again, we can see a wide peak that corresponds to the RBD of the S protein. You can also click on the plot to highlight the residue on the interactive visualizations.In Cross-correlations between Residue Fluctuations, we can see the full cross-correlation heat map and see the correlation between each pair of residue.For Inter-residue Contact Map, you will see a visualization of the connected alpha-Carbon structure based on the cutoff distance. On the right is the Connectivity Map that indicates which pair of residues are within the cutoff distance. The default is set to 7.3 Å. If you want to change the threshold, we have to redo the calculations and change the cutoff distance in Advanced options.Finally, in Properties of GNM Mode Spectrum, we can see two different plots on modes: Frequency Dispersion and Degree of Collectivity. In the frequency dispersion plot, a high value indicates a slow mode with low frequency, which are expected to be highly related to biological functions. Recall that the slowest modes contribute the most to the protein fluctuation. The degree of collectivity plot measures the extent of structural elements (residues) move together for each mode. High degree of collectivity indicates that the mode is highly cooperative and engages in a large portion of the structure. Low degree of collectivity indicates that the mode only affects a small region.That is all for how to get the structural dynamics results of DynOmics. If you are interested in the other results, DynOmics has provided its own tutorial here.We will now head back to the main text in order to analyze our GNM/ANM results of SARS-CoV-2 S protein and compare it with SARS-CoV S protein to see if we can distinguish any significant differences.If you would rather perform GNM/ANM analysis using command line, ProDy, and VMD, please go to the following tutorialsVisit GNM tutorialVisit ANM tutorialReturn to main text"
} ,
{
"title" : "Software Tutorial: Analysis of Coronavirus Spike Proteins Using GNM",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_GNM",
"date" : "",
"content" : "Gaussian Network Model CalculationsIn this tutorial, we will be on performing GNM calculations on one of the chains in the SARS-CoV-2 S protein and then visualizing the results in different maps and plots. Please be sure to have 6vxx.pdb downloaded in the current working directory. (You can also download it directly while parsing as explained in the RMSD Tutorial.)First, follow the steps in Setting up ProDy to start up IPython and import the neccessary functions.Next, we will parse in 6vxx and set it as the variable spike.In[#]: spike = parsePDB('6vxx.pdb')For this GNM calculation, we will focus only on the alpha-carbons of Chain A. We will create variable calphas with the selection.In[#]: calphas = spike.select('calpha and chain A')Now, we will instantiate a GNM instance and build the corresponding Kirchhoff matrix. You can pass parameters for the cutoff (threshold distance between atoms) and gamma (spring constant). The defaults are 10.0 Å and 1.0, respectively. Here, we will set the cutoff to be 20.0 Å.In[#]: gnm = GNM('SARS-CoV-2 Spike (6vxx) Chain A Cutoff = 20.0 A') #This is the title that will appear on top of the plotsIn[#]: gnm.buildKirchhoff(calphas, cutoff=20.0)For the creation of normal modes, the default is 20 non-zero modes. This value can be changed and zero modes can be kept if desired. e.g. gnm.calcModes(50, zeros=True). We will use the default. In addition, we will create hinge sites for later use in the slow mode shape plot. These sites represent places in the protein where the fluctuations change in relative direction.In[#]: gnm.calcModes()In[#]: hinges = gnm.getHinges()(Optional) Information of the GNM and Kirchhoff matrix can be pulled with the following commands.In[#]: gnm.getEigvals()In[#]: gnm.getEigvecs()In[#]: gnm.getCovariance()#To get information specifically on the slowest mode (which is always indexed at 0):In[#]: slowMode = gnm[0]In[#]: slowMode.getEigval()In[#]: slowMode.getEigvec()We have now successfully created the GNM calculations and can generate the maps plots. Make sure to save the visualization (if desired) and close the plot before creating another. We will discuss how to interpret these visualization back in the main text.Contact Map:In[#]: showContactMap(gnm);Cross-correlations:In[#]: showCrossCorr(gnm);Slow Mode Shape:In[#]: showMode(gnm[0], hinges=True)In[#]: grid();Square FluctuationsIn[#]: showSqFlucts(gnm[0], hinges=True);Now, let’s head back to the main text on how to read our visualizations and analyze our results.Return to main text"
} ,
{
"title" : "Software Tutorial: Computing the Energy Contributed by a Local Region of the SARS-CoV-2 Spike Protein Bound with the Human ACE2 Enzyme",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_NAMD",
"date" : "",
"content" : "In this tutorial, we will show how to use NAMD Energy to calculate the interaction energy for a bound complex, as well as to determine how much a given region of this complex contributes to the overall potential energy. We will use the chimeric SARS-CoV-2 RBD-ACE2 complex (PDB entry: 6vw1) and compute the interaction energy contributed by the loop site that we identified as a region of structural difference in a previous lesson.To determine the energy contributed by a region of a complex, we will need a “force field”, an energy function with a collection of parameters that determine the energy of a given structure based on the positional relationships between atoms. There are many different force fields depending on the specific type of system being studied (e.g. DNA, RNA, lipids, proteins). There are many different approaches for generating a force field; for example, Chemistry at Harvard Macromolecular Mechanics (CHARMM)1 offers a popular collection of force fields.To get started, make sure to have installed VMD and know how to load molecules into the program; if you need a refresher, visit the VMD and Multiseq Tutorial. Then, download NAMD; you may be asked to provide the path to your NAMD installation.Creating a protein structure fileNAMD needs to utilize the information in the force field to calculate the potential energy of a protein. To do this, it needs a protein structure file (PSF). A PSF, which is molecule-specific, contains all the information required to apply a force field to a molecular system.2 Fortunately, there are programs that can generate a PSF given a force field and a .pdb file containing a protein structure. See this NAMD tutorial for more information.First, load 6vw1 into VMD. We then need to create a protein structure file of 6vw1 to simulate the molecule. We will be using the VMD plugin Atomatic PSF Builder to create the file. From VMD Main, click Extensions > Modeling > Automatic PSF Builder.In the AutoPSF window, make sure that the selected molecule is 6vw1.pdb and that the output is 6vw1_autopsf. Click Load input files. In step 2, click Protein and then Guess and split chains using current selections. Then click Create chains and then Apply patches and finish PSF/PDB.During this process, you may see an error message stating MOLECULE DESTROYED. If you see this message, click Reset Autopsf and repeat the above steps. The selected molecule will change, so make sure that the selected molecule is 6vw1.pdb when you start over. Failed molecules remain in VMD, so deleting the failed molecule from VMD Main is recommended before each new attempt.If the PSF file is successfully created, then you will see a message stating Structure complete. The VMD Main window also will have an additional line.Using NAMD Energy to compute the energy of the SARS-CoV-2 RBD loop regionNow that we have the PSF file, we can proceed to NAMD Energy. In VMD Main, click Extensions > Analysis > NAMD Energy. The NAMDEnergy window will pop up. First, change the molecule to be the PSF file that we created.We now want to calculate the interaction energy between the RBD and ACE2. Recall that the corresponding chain pairs are chain A (ACE2)/chain E (RBD) and chain B (ACE2)/chain F (RBD). As we did in the previous tutorial, we will use the chain B/F pair. Put protein and chain B and protein and chain F for Selection 1 and Selection 2, respectively.Next, we want to calculate the main protein-protein interaction energies, divided over electrostatic and van der Waals forces. Under Output File, enter your desired name for the results (e.g., SARS-2_RBD-ACE2_energies). Next, we need to give NAMDEnergy the parameter file par_all36_prot.prm. This file should be found at VMD > plugins > noarch > tcl > readcharmmpar1.3 > par_all36_prot.prm. Finally, click Run NAMDEnergy.The output file will be created in your current working directory and can be opened with a simple text-editor. The values of your results may vary slightly upon repetitive calculations.Note: You may be wondering why the interaction energy comes out to be a negative number. In physics, a negative value indicates an attractive force between two molecules, and a positive value indicates a repulsive force.We will now focus on the interaction energy between the SARS-CoV-2 RBD loop site (residues 482 to 486) and ACE2. In the NAMDEnergy window, enter protein and chain B for Selection 1 and protein and chain F and (resid 482 to 486) for Selection 2. Keep all other settings the same. You should see output results similar to the following.The above results seem to indicate that the interaction between SARS-CoV-2 RBD and ACE2 is a favorable interaction, and that the loop region contributes to this bonding. Yet our goal was to compare the total energy of the bound RBD-ACE2 complex in SARS-CoV-2 against that of SARS-CoV, as well as to compare the energy contributed by the three regions of structural difference that we identified in the main text. We will leave these comparisons to you as an exercise, and we will discuss the results in the main text.STOP: First, compute the total energy of the SARS-CoV RBD complex with ACE2 (PDB entry: 2ajf). How does it compare against the energy of the SARS-CoV-2 complex? Then, compute the energy contributed by hotspot 31 and hotspot 353 in SARS-CoV-2, as well as that of the regions corresponding to these regions and the loop region in SARS-CoV. (Consult the table below as needed.) How do the energy contributions of corresponding regions compare? Is this surprising, and what can we conclude?Note: In the table below, “chain B” is part of the ACE2 enzyme, and “chain F” is part of the viral spike protein RBD for the virus indicated. Model Region Selection 1 Selection 2 SARS-CoV-2 (6vw1) Total protein and chain B protein and chain F SARS-CoV (2ajf) Total protein and chain B protein and chain F SARS-CoV-2 (6vw1) Loop protein and chain B protein and chain F and (resid 482 to 486) SARS-CoV (2ajf) Loop protein and chain B protein and chain F and (resid 468 to 472) SARS-CoV-2 (6vw1) Hotspot31 protein and chain B protein and chain F and resid 455 SARS-CoV-2 (6vw1) Hotspot31 protein and chain B and (resid31 or resid 35) protein and chain F SARS-CoV (2ajf) Hotspot31 protein and chain B protein and chain F and resid 442 SARS-CoV (2ajf) Hotspot31 protein and chain B and (resid 31 or resid 35) protein and chain F SARS-CoV-2 (6vw1) Hotspot353 protein and chain B protein and chain F and resid 501 SARS-CoV-2 (6vw1) Hotspot353 protein and chain B and (resid 38 or resid 353) protein and chain F SARS-CoV (2ajf) Hotspot353 protein and chain B protein and chain F and resid 487 SARS-CoV (2ajf) Hotspot353 protein and chain B and (resid 38 or resid 353) protein and chain F Return to main text https://www.charmmtutorial.org/index.php/The_Energy_Function ↩ https://www.ks.uiuc.edu/Training/Tutorials/namd/namd-tutorial-unix-html/node23.html ↩ "
} ,
{
"title" : "Software Tutorial: Applying Principal Components Analysis to Nuclear Image Boundaries",
"category" : "",
"tags" : "",
"url" : "/white_blood_cells/tutorial_PCA",
"date" : "",
"content" : "Step 3 PCA Model GenerationHaving completed Steps 1 and 2, all of our remaining images, and their subsequent labels, should be pre – processed and ready to train into a PCA model. In CellOrganizer, this is sampled as demo2D08.In this step of the pipeline, we return to MATLAB for running a modified version of CellOrganizer’s demo2D08 in order to generate the PCA model for our white blood cells. This model would then be used to plot the shape space by either cell type or cell class. Furthermore, we do some post – processing cleanup to ensure our resulting model could be easily read into the visualization code by using only the first three principal components.Open MATLAB and navigate into your CellOrganizer directory. Then, run the following command in the MATLAB command window:> setupRun the following commands in the MATLAB command window:> clear> clc> cd ~/Desktop/WBC_PCAPipeline/Step3_PCAModel> WBC_PCAModelAs a result, the Step3_PCAModel and Step4_Visualization directories have been updated. The principal components along with the assigned label to each cell are captured in the WBC_PCA.csv file within the Step4 directory. Information about the images used and the CellOrganizer generated shape space can be found by clicking on Step3_PCAModel/report/index.html.Note: For any subsequent run of the WBC_PCAModel file, make sure to delete any log and param files that have been created from a previous run. All other files will be overwritten unless preemptively removed from the WBC_PCAModel file’s access. Saving the files can be done by either compressing the files into a zip folder or removing them from the directory.We next want to view our model results. First, run the following commands in the MATLAB command window:> load(‘WBC_PCA.mat’);> scr = model.nuclearShapeModel.score;Double-click on the scr variable in the Workspace window.In the matrix on your screen, each row represents an image and each column represents the subsequent PCA components for the image. For the purpose of our shape space visualization, we will only be focusing on the first three principal components."
} ,
{
"title" : "Software Tutorial: Using ab initio Modeling to Predict the Structure of Hemoglobin Subunit Alpha",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_ab_initio",
"date" : "",
"content" : "In this software tutorial, we will use the popular ab initio modeling software called QUARK. Because of the complexity of ab initio algorithms, QUARK limits us to polypeptides with at most 200 amino acids, and so rather than determining the structure of the SARS-CoV-2 spike protein (each monomer has 1281 amino acids), we will work with hemoglobin subunit alpha (PDB entry 1si4), which is only 141 amino acids long.Before beginning, if you have not used QUARK before, then you will need to register for a QUARK account to use this software. After registering, you will receive an email containing a temporary password.Then, download the primary sequence of human hemoglobin subunit alpha. Visit QUARK to find the submission page for QUARK, where you should take the following steps as shown in the figure below. Copy and paste the sequence into the first box. Add your email address and password. Click Run QUARK.Even though this is a short protein, it will take at least a few hours to run your submission, depending on server load. When your job has finished, you will receive an email notification and be able to download the results. In the meantime, you may like to join us back in the main text.Note: QUARK will not return a single best answer but rather the top five best-scoring structures that it finds. Continuing the exploration analogy from the text, think of these results as the five lowest points in the search space that QUARK is able to find.In the main text, we will show a figure of our models and compare them to the known structure of human hemoglobin subunit alpha from the PDB entry 1si4. You can also download our completed models if you like.Return to main text"
} ,
{
"title" : "Software Tutorial: Modeling bacterial adaptation to changing attractant",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/tutorial_adaptation",
"date" : "",
"content" : "In this tutorial, we will extend the BioNetGen model covered in the phosphorylation tutorial to add the methylation mechanisms described in the main text to our ongoing model of bacterial chemotaxis. Our model will be based on the model by Spiro et al.1We will also add compartmentalization to our model, which will allow us to differentiate molecules that occur inside and outside of the cell.Finally, after running our model, we will see how methylation can be used to help the bacterium adapt to a relative change in attractant concentration. For reference, consult the figure below, reproduced from the main text, for an overview of the chemotaxis pathway.The chemotaxis signal-transduction pathway with methylation included. CheA phosphorylates CheB, which methylates MCPs, while CheR demethylates MCPs. Blue lines denote phosphorylation, grey lines denote dephosphorylation, and the green arrow denotes methylation. Image modified from Parkinson Lab’s illustrations.To get started, create a copy of your file from the phosphorylation tutorial and save it as adaptation.bngl. If you would rather not follow along below, you can download a completed BioNetGen file here: adaptation.bngl.Specifying molecule typesWe first will add all molecules needed for our model. As mentioned in the main text, we will assume that an MCP can have one of three methylation states: low (A), medium (B), and high (C). We also need to include a component that will allow for the receptor to bind to CheR. As a result, we update our MCP molecule to T(l,r,Meth~A~B~C,Phos~U~P).Furthermore, we need to represent CheR and CheB; recall that CheR binds to and methylates receptor complexes, while CheB demethylates them. CheR can bind to T, so that we will need the molecule CheR(t). CheB is phosphorylated by CheY, and so it will be represented as CheB(Phos~U~P). Later we will specify reactions specifying how CheR and CheB change the methylation states of receptor complexes.begin molecule types L(t) T(l,r,Meth~A~B~C,Phos~U~P) CheY(Phos~U~P) CheZ() CheB(Phos~U~P) CheR(t)end molecule typesIn the observable section, we specify that we are interested in tracking the concentrations of the bound ligand, phosphorylated CheY and CheB, and the receptor at each methylation level.begin observables Molecules bound_ligand L(t!1).T(l!1) Molecules phosphorylated_CheY CheY(Phos~P) Molecules low_methyl_receptor T(Meth~A) Molecules medium_methyl_receptor T(Meth~B) Molecules high_methyl_receptor T(Meth~C) Molecules phosphorylated_CheB CheB(Phos~P)end observablesDefining reactionsWe now expand our reaction rules to include methylation. First, we change the autophosphorylation rules of the receptor to have different rates depending on whether the receptor is bound and its current methylation level, which produces six rules.Note: We cannot avoid combinatorial explosion in the case of these phosphorylation reactions because they take place at different rates.) In what follows, we use experimentally verified reaction rates.#Receptor complex (specifically CheA) autophosphorylation#Rate dependent on methylation and binding states#Also on free vs. bound with ligandTaUnboundP: T(l,Meth~A,Phos~U) -> T(l,Meth~A,Phos~P) k_TaUnbound_phosTbUnboundP: T(l,Meth~B,Phos~U) -> T(l,Meth~B,Phos~P) k_TaUnbound_phos*1.1TcUnboundP: T(l,Meth~C,Phos~U) -> T(l,Meth~C,Phos~P) k_TaUnbound_phos*2.8TaLigandP: L(t!1).T(l!1,Meth~A,Phos~U) -> L(t!1).T(l!1,Meth~A,Phos~P) 0TbLigandP: L(t!1).T(l!1,Meth~B,Phos~U) -> L(t!1).T(l!1,Meth~B,Phos~P) k_TaUnbound_phos*0.8TcLigandP: L(t!1).T(l!1,Meth~C,Phos~U) -> L(t!1).T(l!1,Meth~C,Phos~P) k_TaUnbound_phos*1.6Next, we will need reactions for CheR binding to receptor complexes and methylating them. First, we consider the binding of CheR to the receptor.#CheR binding to receptor complexTRBind: T(r) + CheR(t) <-> T(r!2).CheR(t!2) k_TR_bind, 1Second, we will need multiple reaction rules for methylation of receptors by CheR because the rate of the reaction can depend on whether a ligand is already bound to the receptor as well as the current methylation level of the receptor. This gives us four rules, since a receptor at the “high” methylation level (C) cannot have increased methylation. Note also that the rate of the methylation reaction is higher if the methylation level is low (A) and significantly higher if the receptor is already bound.#CheR methylating the receptor complex#Rate of methylation is dependent on methylation states and ligand bindingTRBind: T(r) + CheR(t) <-> T(r!2).CheR(t!2) k_TR_bind, k_TR_disTaRUnboundMeth: T(r!2,l,Meth~A).CheR(t!2) -> T(r,l,Meth~B) + CheR(t) k_TaR_methTbRUnboundMeth: T(r!2,l,Meth~B).CheR(t!2) -> T(r,l,Meth~C) + CheR(t) k_TaR_meth*0.1TaRLigandMeth: T(r!2,l!1,Meth~A).L(t!1).CheR(t!2) -> T(r,l!1,Meth~B).L(t!1) + CheR(t) k_TaR_meth*30TbRLigandMeth: T(r!2,l!1,Meth~B).L(t!1).CheR(t!2) -> T(r,l!1,Meth~C).L(t!1) + CheR(t) k_TaR_meth*3Finally, we need reactions for CheB. First, we consider its phosphorylation by the receptor and its autodephosphorylation. Each of these two reactions occurs at a rate that is independent of any other state of the receptor or CheB.#CheB is phosphorylated by receptor complex, and autodephosphorylatesCheBphos: T(Phos~P) + CheB(Phos~U) -> T(Phos~U) + CheB(Phos~P) k_B_phosCheBdephos: CheB(Phos~P) -> CheB(Phos~U) k_B_dephosCheB also demethylates the receptor complex, at a rate that depends on the current methylation state. (We do not include state A since it cannot be further demthylated.)#CheB demethylates receptor complex#Rate dependent on methylation statesTbDemeth: T(Meth~B) + CheB(Phos~P) -> T(Meth~A) + CheB(Phos~P) k_Tb_demethTcDemeth: T(Meth~C) + CheB(Phos~P) -> T(Meth~B) + CheB(Phos~P) k_Tc_demethWe are now ready to combine the above reaction rules with the reaction rules we are inheriting from the original model (ligand-receptor binding and CheY phosphorylation/dephosphorylation) to give us a complete set of reaction rules.begin reaction rules #Ligand-receptor binding LigandReceptor: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_dis #CheY phosphorylation by T and dephosphorylation by CheZ YPhos: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phos YDephos: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephos #Receptor complex (specifically CheA) autophosphorylation #Rate dependent on methylation and binding states #Also on free vs. bound with ligand TaUnboundP: T(l,Meth~A,Phos~U) -> T(l,Meth~A,Phos~P) k_TaUnbound_phos TbUnboundP: T(l,Meth~B,Phos~U) -> T(l,Meth~B,Phos~P) k_TaUnbound_phos*1.1 TcUnboundP: T(l,Meth~C,Phos~U) -> T(l,Meth~C,Phos~P) k_TaUnbound_phos*2.8 TaLigandP: L(t!1).T(l!1,Meth~A,Phos~U) -> L(t!1).T(l!1,Meth~A,Phos~P) 0 TbLigandP: L(t!1).T(l!1,Meth~B,Phos~U) -> L(t!1).T(l!1,Meth~B,Phos~P) k_TaUnbound_phos*0.8 TcLigandP: L(t!1).T(l!1,Meth~C,Phos~U) -> L(t!1).T(l!1,Meth~C,Phos~P) k_TaUnbound_phos*1.6 #CheY phosphorylation by T and dephosphorylation by CheZ YP: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phos YDep: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephos #CheR binds to and methylates receptor complex #Rate dependent on methylation states and ligand binding TRBind: T(r) + CheR(t) <-> T(r!2).CheR(t!2) k_TR_bind, k_TR_dis TaRUnboundMeth: T(r!2,l,Meth~A).CheR(t!2) -> T(r,l,Meth~B) + CheR(t) k_TaR_meth TbRUnboundMeth: T(r!2,l,Meth~B).CheR(t!2) -> T(r,l,Meth~C) + CheR(t) k_TaR_meth*0.1 TaRLigandMeth: T(r!2,l!1,Meth~A).L(t!1).CheR(t!2) -> T(r,l!1,Meth~B).L(t!1) + CheR(t) k_TaR_meth*30 TbRLigandMeth: T(r!2,l!1,Meth~B).L(t!1).CheR(t!2) -> T(r,l!1,Meth~C).L(t!1) + CheR(t) k_TaR_meth*3 #CheB is phosphorylated by receptor complex, and autodephosphorylates CheBphos: T(Phos~P) + CheB(Phos~U) -> T(Phos~U) + CheB(Phos~P) k_B_phos CheBdephos: CheB(Phos~P) -> CheB(Phos~U) k_B_dephos #CheB demethylates receptor complex #Rate dependent on methyaltion states TbDemeth: T(Meth~B) + CheB(Phos~P) -> T(Meth~A) + CheB(Phos~P) k_Tb_demeth TcDemeth: T(Meth~C) + CheB(Phos~P) -> T(Meth~B) + CheB(Phos~P) k_Tc_demethend reaction rulesAdding CompartmentsIn biological systems, the plasma membrane separates molecules inside of the cell from the external environment. In our chemotaxis system, ligands are outside of the cell, receptors and flagellar proteins are on the membrane, and CheY, CheR, CheB, CheZ are inside the cell.BioNetGen allows us to compartmentalize our model based on the location of different molecules. Although our model does not call for compartmentalization, it has value in models where we need different concentrations based on different cellular compartments, influencing the rates of reactions involving molecules within these compartments. For this reason, we will take the opportunity to add compartmentalization into our model.Below, we define three compartments corresponding to extra-cellular space (outside the cell), the plasma membrane, and the cytoplasm (inside the cell). Each row indicates four parameters: the name of the compartment; the dimension (2-D or 3-D); surface area (2-D) or volume (3-D) of the compartment; and the name of the parent compartment - the compartment that encloses this current compartment.If you are interested, more information on compartmentalization can be found on pages 54-55 of Sekar and Faeder’s primer on rule-based modeling: http://www.lehman.edu/academics/cmacs/documents/RuleBasedPrimer-2011.pdf.begin compartments EC 3 100 #um^3 PM 2 1 EC #um^2 CP 3 1 PM #um^3end compartmentsSpecifying concentrations and reaction ratesTo add compartmentalization information in the seed species section of our BioNetGen model, we use the notation @location before the specification of the concentrations. In what follows, we specify initial concentrations of ligand, receptor, and chemotaxis enzymes at different states. The distribution of molecule concentrations at each state is very difficult to verify experimentally; the distribution provided here approximates equilibrium concentrations in our simulation, and they are within a biologically reasonable range.2begin seed species @EC:L(t) L0 @PM:T(l,r,Meth~A,Phos~U) T0*0.84*0.9 @PM:T(l,r,Meth~B,Phos~U) T0*0.15*0.9 @PM:T(l,r,Meth~C,Phos~U) T0*0.01*0.9 @PM:T(l,r,Meth~A,Phos~P) T0*0.84*0.1 @PM:T(l,r,Meth~B,Phos~P) T0*0.15*0.1 @PM:T(l,r,Meth~C,Phos~P) T0*0.01*0.1 @CP:CheY(Phos~U) CheY0*0.71 @CP:CheY(Phos~P) CheY0*0.29 @CP:CheZ() CheZ0 @CP:CheB(Phos~U) CheB0*0.62 @CP:CheB(Phos~P) CheB0*0.38 @CP:CheR(t) CheR0end seed speciesFinally, we need to assign values to the parameters. We will assume that we start with a zero ligand concentration. We then assign the initial concentration of each molecule and rates of our reactions based on in vivo stoichiometry and parameter tuning 34.Note: Although we discussed reaction rules first, the parameters section below has to appear before the reaction rules section.begin parameters NaV 6.02e8 #Unit conversion to cellular concentration M/L -> #/um^3 miu 1e-6 L0 0 #number of molecules/cell T0 7000 #number of molecules/cell CheY0 20000 #number of molecules/cell CheZ0 6000 #number of molecules/cell CheR0 120 #number of molecules/cell CheB0 250 #number of molecules/cell k_lr_bind 8.8e6/NaV #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociation k_TaUnbound_phos 7.5 #receptor complex autophosphorylation k_Y_phos 3.8e6/NaV #receptor complex phosphorylates Y k_Y_dephos 8.6e5/NaV #Z dephosphoryaltes Y k_TR_bind 2e7/NaV #Receptor-CheR binding k_TR_dis 1 #Receptor-CheR dissociation k_TaR_meth 0.08 #CheR methylates receptor complex k_B_phos 1e5/NaV #CheB phosphorylation by receptor complex k_B_dephos 0.17 #CheB autodephosphorylation k_Tb_demeth 5e4/NaV #CheB demethylates receptor complex k_Tc_demeth 2e4/NaV #CheB demethylates receptor complexend parametersCompleting our adaptation simulationWe will be ready to simulate once we place the following code after end model. We will run our simulation for 800 seconds.generate_network({overwrite=>1})simulate({method=>"ssa", t_end=>800, n_steps=>800})The following code contains our complete simulation.begin modelbegin molecule types L(t) T(l,r,Meth~A~B~C,Phos~U~P) CheY(Phos~U~P) CheZ() CheB(Phos~U~P) CheR(t)end molecule typesbegin observables Molecules bound_ligand L(t!1).T(l!1) Molecules phosphorylated_CheY CheY(Phos~P) Molecules low_methyl_receptor T(Meth~A) Molecules medium_methyl_receptor T(Meth~B) Molecules high_methyl_receptor T(Meth~C) Molecules phosphorylated_CheB CheB(Phos~P) Molecules CheRbound T(r!2).CheR(t!2)end observablesbegin parameters NaV2 6.02e8 #Unit conversion to cellular concentration M/L -> #/um^3 miu 1e-6 L0 1e7 T0 7000 CheY0 20000 CheZ0 6000 CheR0 120 CheB0 250 k_lr_bind 8.8e6/NaV2 #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociation k_TaUnbound_phos 7.5 #receptor complex autophosphorylation k_Y_phos 3.8e6/NaV2 #receptor complex phosphorylates Y k_Y_dephos 8.6e5/NaV2 #Z dephosphoryaltes Y k_TR_bind 2e7/NaV2 #Receptor-CheR binding k_TR_dis 1 #Receptor-CheR dissociaton k_TaR_meth 0.08 #CheR methylates receptor complex k_B_phos 1e5/NaV2 #CheB phosphorylation by receptor complex k_B_dephos 0.17 #CheB autodephosphorylation k_Tb_demeth 5e4/NaV2 #CheB demethylates receptor complex k_Tc_demeth 2e4/NaV2 #CheB demethylates receptor complexend parametersbegin reaction rules LR: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_dis #Receptor complex (specifically CheA) autophosphorylation #Rate dependent on methylation and binding states #Also on free vs. bound with ligand TaUnboundP: T(l,Meth~A,Phos~U) -> T(l,Meth~A,Phos~P) k_TaUnbound_phos TbUnboundP: T(l,Meth~B,Phos~U) -> T(l,Meth~B,Phos~P) k_TaUnbound_phos*1.1 TcUnboundP: T(l,Meth~C,Phos~U) -> T(l,Meth~C,Phos~P) k_TaUnbound_phos*2.8 TaLigandP: L(t!1).T(l!1,Meth~A,Phos~U) -> L(t!1).T(l!1,Meth~A,Phos~P) 0 TbLigandP: L(t!1).T(l!1,Meth~B,Phos~U) -> L(t!1).T(l!1,Meth~B,Phos~P) k_TaUnbound_phos*0.8 TcLigandP: L(t!1).T(l!1,Meth~C,Phos~U) -> L(t!1).T(l!1,Meth~C,Phos~P) k_TaUnbound_phos*1.6 #CheY phosphorylation by T and dephosphorylation by CheZ YP: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phos YDep: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephos #CheR binds to and methylates receptor complex #Rate dependent on methylation states and ligand binding TRBind: T(r) + CheR(t) <-> T(r!2).CheR(t!2) k_TR_bind, k_TR_dis TaRUnboundMeth: T(r!2,l,Meth~A).CheR(t!2) -> T(r,l,Meth~B) + CheR(t) k_TaR_meth TbRUnboundMeth: T(r!2,l,Meth~B).CheR(t!2) -> T(r,l,Meth~C) + CheR(t) k_TaR_meth*0.1 TaRLigandMeth: T(r!2,l!1,Meth~A).L(t!1).CheR(t!2) -> T(r,l!1,Meth~B).L(t!1) + CheR(t) k_TaR_meth*30 TbRLigandMeth: T(r!2,l!1,Meth~B).L(t!1).CheR(t!2) -> T(r,l!1,Meth~C).L(t!1) + CheR(t) k_TaR_meth*3 #CheB is phosphorylated by receptor complex, and autodephosphorylates CheBphos: T(Phos~P) + CheB(Phos~U) -> T(Phos~U) + CheB(Phos~P) k_B_phos CheBdephos: CheB(Phos~P) -> CheB(Phos~U) k_B_dephos #CheB demethylates receptor complex #Rate dependent on methyaltion states TbDemeth: T(Meth~B) + CheB(Phos~P) -> T(Meth~A) + CheB(Phos~P) k_Tb_demeth TcDemeth: T(Meth~C) + CheB(Phos~P) -> T(Meth~B) + CheB(Phos~P) k_Tc_demethend reaction rulesbegin compartments EC 3 100 #um^3 PM 2 1 EC #um^2 CP 3 1 PM #um^3end compartmentsbegin seed species @EC:L(t) L0 @PM:T(l,r,Meth~A,Phos~U) T0*0.84*0.9 @PM:T(l,r,Meth~B,Phos~U) T0*0.15*0.9 @PM:T(l,r,Meth~C,Phos~U) T0*0.01*0.9 @PM:T(l,r,Meth~A,Phos~P) T0*0.84*0.1 @PM:T(l,r,Meth~B,Phos~P) T0*0.15*0.1 @PM:T(l,r,Meth~C,Phos~P) T0*0.01*0.1 @CP:CheY(Phos~U) CheY0*0.71 @CP:CheY(Phos~P) CheY0*0.29 @CP:CheZ() CheZ0 @CP:CheB(Phos~U) CheB0*0.62 @CP:CheB(Phos~P) CheB0*0.38 @CP:CheR(t) CheR0end seed speciesend modelgenerate_network({overwrite=>1})simulate({method=>"ssa", t_end=>800, n_steps=>800})Running our adaptation modelNow save your file and run the model. Because the model is at equilibrium, we will see the seemingly boring plot shown below.Things get interesting when we change the initial concentration of ligand to see how the simulated bacterium will adapt. Run your simulation with L0 = 1e6. What happens to CheY activity? What happens to the concentration of receptors at different methylation states?Try a variety of different initial concentrations of ligand (L0 = 1e4, 1e5, 1e6, 1e7, 1e8), paying attention to the concentration of phosphorylated CheY. How does the concentration change depending on initial ligand concentration?Then try to further raise the ligand concentration to 1e9 and 1e10. How does this affect the outcome of the simulation? Why?Next, try only simulating the first 10 seconds to zoom into what happens to the system at the start. How quickly does CheY concentration reach a minimum? How long does the cell take to return to the original concentration of phosphorylated CheY (i.e., the background tumbling frequency)?Back in the main text, we will examine how a sudden change in the concentration of unbound ligand can cause a quick change in the tumbling frequency of the bacterium, followed by a slow return to its original frequency. We will also see how the extent to which this tumbling frequency is disturbed is dependent upon differences in the initial concentration of ligand.Return to main text Spiro PA, Parkinson JS, and Othmer H. 1997. A model of excitation and adaptation in bacterial chemotaxis. Biochemistry 94:7263-7268. Available online. ↩ Bray D, Bourret RB, Simon MI. 1993. Computer simulation of phosphorylation cascade controlling bacterial chemotaxis. Molecular Biology of the Cell. Available online ↩ Li M, Hazelbauer GL. 2004. Cellular stoichimetry of the components of the chemotaxis signaling complex. Journal of Bacteriology. Available online ↩ Stock J, Lukat GS. 1991. Intracellular signal transduction networks. Annual Review of Biophysics and Biophysical Chemistry. Available online ↩ "
} ,
{
"title" : "Software Tutorial: Visualizing Glycans",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_glycans",
"date" : "",
"content" : "Here, we will show how to visualize glycans in VMD. Be sure to have installed VMD and know how to load molecules into the program. If you need a refresher, go to the VMD and Multiseq Tutorial. In the Visualizing Regions and Residues Tutorial, we went over how to change the visualizations of molecules and proteins in VMD. Please visit that tutorial first if you have not done so already.We will use the PDB entry of the SARS-CoV-2 Spike protein, 6vyb.First, download and load 6vyb into VMD and go to Graphics>Representations. For VMD, there is no specific keyword to select glycans. A workaround is to use the keywords: “not protein and not water”. To recreate the basic VMD visualizations of the glycans in the module, use the following representations. (For the protein chains, use Glass3 for Material).The end result should look like this:In the visualization you just created, the three chains in the S protein are in dark green, dark orange, and dark yellow. The presumed glycans are shown in red. Notice how they are all over the S protein! You may have noticed that one of the chains appear to be different in that part of it is sticking out from the rest of the protein. This is because this the PDB entry 6vyb contains the structure of the SARS-CoV-2 S protein in its open conformation. Let’s return to the main text to see what that means.Return to main text"
} ,
{
"title" : "Software Tutorial: Traveling Up an Attractant Gradient",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/tutorial_gradient",
"date" : "",
"content" : "In the previous tutorial, we modeled how bacteria react and adapt to a one-time addition of attractants. In real life, bacteria don’t suddenly drop into an environment with more attractants; instead, they explore a variable environment. In this tutorial, we will adapt our model to simulate a bacterium as it travels up an exponentially increasing concentration gradient.We will also explore defining and using functions, a feature of BioNetGen that will allow us to specify reaction rules in which the reaction rates are dependent on the current state of the system.To get started, create a copy of your adaptation.bngl file from the adaptation tutorial and save it as addition.bngl. If you would rather not follow along below, you can download a completed BioNetGen file here: addition.bnglWe also will build a Jupyter notebook in this tutorial for plotting the concentrations of molecules over time. You should create a file called plotter_up.ipynb; if you would rather not follow along, we provide a completed notebook here:plotter_up.ipynbBefore running this notebook, make sure the following dependencies are installed. Installation Link Version Check install/version Python3 3.6+ python --version Jupyter Notebook 4.4.0+ jupyter --version Numpy 1.14.5+ pip list \| grep numpy Matplotlib 3.0+ pip list \| grep matplotlib Colorspace or with pip any pip list \| grep colorspace Modeling an increasing ligand gradient with a BioNetGen functionOur BioNetGen model will largely stay the same, except for the fact that we are changing the concentration of ligand over time. To model an increasing concentration of ligand corresponding to a bacterium moving up an attractant gradient, we will increase the background ligand concentration at an exponential rate.We will simulate an increase in attractant concentration by using a “dummy reaction” L → 2L in which one ligand molecule becomes two. To do so, we will add the following reaction to the reaction rules section.As we have observed earlier in this module, when the ligand concentration is very high, receptors are saturated, and the cell can no longer detect a change in ligand concentration. If you explored the adaptation simulation, then you saw that this occurs after l0 passes 1e8; we will therefore cap the allowable ligand concentration at this value.We can cap our ligand concentration by defining the rate of the dummy reaction using a function add_Rate(). This function requires another observable, AllLigand. By adding the line Molecules AllLigand L in the observables section, AllLigand will record the total concentration of ligand in the system at each time step (both bound and unbound). As for our reaction, if AllLigand is less than 1e8, then the dummy reaction should take place at some given rate k_add. Otherwise, AllLigand exceeds1e8, and we will set the rate of the dummy reaction to zero. This can be achieved with a functions section in BioNetGen using the following if statement to branch based on the value of AllLigand.Note: Please ensure that the functions section occurs before the reaction rules section in your BioNetGen file.begin functions addRate() = if(AllLigand>1e8,0,k_add)end functionsNow we are ready to add our dummy reaction to the reaction rules section with a reaction rate of addRate().#Simulate an exponentially increasing gradient using a dummy reactionLAdd: L(t) -> L(t) + L(t) addRate()Now that we have defined our dummy reaction, we should specify the default rate of this reaction k_add in the parameters section. We first will try a value of k_add of 0.1/s with an initial ligand concentration L0 of 1e4. This means that the model is simulating a gradient of d[L]/dt = 0.1[L]. If L0 is 1e4, then the solution to this differential equation is [L] = 1000e0.1t molecules per second.k_add 0.1L0 1e4Running our updated BioNetGen modelBecause we have largely kept the same model from the adaptation tutorial, we are ready to simulate. Please make sure that the following lines appear after end model so that we can run our simulation for 1000 seconds.generate_network({overwrite=>1})simulate({method=>"ssa", t_end=>1000, n_steps=>500})The following code contains our complete simulation, which you can also download here:addition.bnglbegin modelbegin molecule types L(t) T(l,r,Meth~A~B~C,Phos~U~P) CheY(Phos~U~P) CheZ() CheB(Phos~U~P) CheR(t)end molecule typesbegin observables Molecules bound_ligand L(t!1).T(l!1) Molecules phosphorylated_CheY CheY(Phos~P) Molecules low_methyl_receptor T(Meth~A) Molecules medium_methyl_receptor T(Meth~B) Molecules high_methyl_receptor T(Meth~C) Molecules phosphorylated_CheB CheB(Phos~P) Molecules AllLigand Lend observablesbegin parameters NaV2 6.02e8 #Unit conversion to cellular concentration M/L -> #/um^3 miu 1e-6 L0 1e4 T0 7000 CheY0 20000 CheZ0 6000 CheR0 120 CheB0 250 k_lr_bind 8.8e6/NaV2 #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociation k_TaUnbound_phos 7.5 #receptor complex autophosphorylation k_Y_phos 3.8e6/NaV2 #receptor complex phosphorylates Y k_Y_dephos 8.6e5/NaV2 #Z dephosphoryaltes Y k_TR_bind 2e7/NaV2 #Receptor-CheR binding k_TR_dis 1 #Receptor-CheR dissociaton k_TaR_meth 0.08 #CheR methylates receptor complex k_B_phos 1e5/NaV2 #CheB phosphorylation by receptor complex k_B_dephos 0.17 #CheB autodephosphorylation k_Tb_demeth 5e4/NaV2 #CheB demethylates receptor complex k_Tc_demeth 2e4/NaV2 #CheB demethylates receptor complex k_add 0.1 #Ligand increaseend parametersbegin functions addRate() = if(AllLigand>1e8,0,k_add)end functionsbegin reaction rules LigandReceptor: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_dis #Receptor complex (specifically CheA) autophosphorylation #Rate dependent on methylation and binding states #Also on free vs. bound with ligand TaUnboundP: T(l,Meth~A,Phos~U) -> T(l,Meth~A,Phos~P) k_TaUnbound_phos TbUnboundP: T(l,Meth~B,Phos~U) -> T(l,Meth~B,Phos~P) k_TaUnbound_phos*1.1 TcUnboundP: T(l,Meth~C,Phos~U) -> T(l,Meth~C,Phos~P) k_TaUnbound_phos*2.8 TaLigandP: L(t!1).T(l!1,Meth~A,Phos~U) -> L(t!1).T(l!1,Meth~A,Phos~P) 0 TbLigandP: L(t!1).T(l!1,Meth~B,Phos~U) -> L(t!1).T(l!1,Meth~B,Phos~P) k_TaUnbound_phos*0.8 TcLigandP: L(t!1).T(l!1,Meth~C,Phos~U) -> L(t!1).T(l!1,Meth~C,Phos~P) k_TaUnbound_phos*1.6 #CheY phosphorylation by T and dephosphorylation by CheZ YPhos: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phos YDephos: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephos #CheR binds to and methylates receptor complex #Rate dependent on methylation states and ligand binding TRBind: T(r) + CheR(t) <-> T(r!2).CheR(t!2) k_TR_bind, k_TR_dis TaRUnboundMeth: T(r!2,l,Meth~A).CheR(t!2) -> T(r,l,Meth~B) + CheR(t) k_TaR_meth TbRUnboundMeth: T(r!2,l,Meth~B).CheR(t!2) -> T(r,l,Meth~C) + CheR(t) k_TaR_meth*0.1 TaRLigandMeth: T(r!2,l!1,Meth~A).L(t!1).CheR(t!2) -> T(r,l!1,Meth~B).L(t!1) + CheR(t) k_TaR_meth*30 TbRLigandMeth: T(r!2,l!1,Meth~B).L(t!1).CheR(t!2) -> T(r,l!1,Meth~C).L(t!1) + CheR(t) k_TaR_meth*3 #CheB is phosphorylated by receptor complex, and autodephosphorylates CheBphos: T(Phos~P) + CheB(Phos~U) -> T(Phos~U) + CheB(Phos~P) k_B_phos CheBdephos: CheB(Phos~P) -> CheB(Phos~U) k_B_dephos #CheB demethylates receptor complex #Rate dependent on methyaltion states TbDemeth: T(Meth~B) + CheB(Phos~P) -> T(Meth~A) + CheB(Phos~P) k_Tb_demeth TcDemeth: T(Meth~C) + CheB(Phos~P) -> T(Meth~B) + CheB(Phos~P) k_Tc_demeth #Simulate exponentially increasing gradient LAdd: L(t) -> L(t) + L(t) addRate()end reaction rulesbegin compartments EC 3 100 #um^3 PM 2 1 EC #um^2 CP 3 1 PM #um^3end compartmentsbegin seed species @EC:L(t) L0 @PM:T(l,r,Meth~A,Phos~U) T0*0.84*0.9 @PM:T(l,r,Meth~B,Phos~U) T0*0.15*0.9 @PM:T(l,r,Meth~C,Phos~U) T0*0.01*0.9 @PM:T(l,r,Meth~A,Phos~P) T0*0.84*0.1 @PM:T(l,r,Meth~B,Phos~P) T0*0.15*0.1 @PM:T(l,r,Meth~C,Phos~P) T0*0.01*0.1 @CP:CheY(Phos~U) CheY0*0.71 @CP:CheY(Phos~P) CheY0*0.29 @CP:CheZ() CheZ0 @CP:CheB(Phos~U) CheB0*0.62 @CP:CheB(Phos~P) CheB0*0.38 @CP:CheR(t) CheR0end seed speciesend modelgenerate_network({overwrite=>1})simulate({method=>"ssa", t_end=>1000, n_steps=>500})Save your file, then go to simulation and click Run. What happens to the concentration of phosphorylated CheY?Note: You can deselect AllLigand to make the plot of the concentration of phosphorylated CheY easier to see.Try the following few different values for k_add: 0.01, 0.03, 0.05, 0.1, 0.3, 0.5. What do these changing k_add values represent in the simulation? How does the system respond to the different values?All of your simulation results are stored in the RuleBender-workspace/PROJECT_NAME/results/MODEL_NAME/TIME/ directory in your computer. Rename the directory with the k_add values instead of the time of running for simplicity.You will observe that CheY phosphorylation drops gradually first, instead of the instantaneous sharp drop as we add lots of ligand at once. That means, with the ligand concentration increases, the cell is able to continuously lower the tumbling frequency.Visualizing the results of our simulationWe are now ready to fill in plotter_up.ipynb, a Jupyter notebook that we will use to visualize the outcome of our simulations.First specify the directories, model name, species of interest, and rates. Put the RuleBender-workspace/PROJECT_NAME/results/MODEL_NAME/ folder inside the same directory as plotter_up.ipynb or change the model_path below to point at this folder.#Specify the data to plot here.model_path = "addition" #The folder containing the modelmodel_name = "addition" #Name of the modeltarget = "phosphorylated_CheY" #Target moleculevals = [0.01, 0.03, 0.05, 0.1, 0.3, 0.5] #Gradients of interestWe next provide some import statements for needed dependencies.import numpy as npimport sysimport osimport matplotlib.pyplot as pltimport colorspaceTo compare the responses for different gradients, we color-code each gradient. Colorspace is one of the straight-forward ways to set up a color palette. Here we use a qualitative palette with hues (h) equally spaced between [0, 300], and constant chroma (c) and luminance (l) values.#Define the colors to usecolors = colorspace.qualitative_hcl(h=[0, 300.], c = 60, l = 70, pallete = "dynamic")(len(vals))The following function loads and parses the data. Once the file containing your data is loaded, we use the first row to investigate which column stores the concentration of the “target” observable species of interest. When we find that target, we will then access the time points and concentrations of this target molecule.def load_data(val): file_path = os.path.join(model_path, str(val), model_name + ".gdat") with open(file_path) as f: first_line = f.readline() #Read the first line cols = first_line.split()[1:] #Get the col names (species names) ind = 0 while cols[ind] != target: ind += 1 #Get col number of target molecule data = np.loadtxt(file_path) #Load the file time = data[:, 0] #Time points concentration = data[:, ind] #Concentrations return time, concentrationNow we will write a function to plot the time coordinates on the x-axis and the concentrations of the molecule at these time points on the y-axis. To do so, we will use the Matplotlib plot function to plot concentrations through time for each gradient value. Time-series data will be colored by the color palette we mentioned earlier.def plot(val, time, concentration, ax, i): legend = "k = " + str(val) ax.plot(time, concentration, label = legend, color = colors[i]) ax.legend() returnThe plotting function above needs to be initialized with a figure defined by the subplot function. We loop through every gradient concentration to perform the plotting. Afterward, we define labels for the x-axis and y-axis, figure title, and tick lines. The call to plt.show() displays the plot.fig, ax = plt.subplots(1, 1, figsize = (10, 8))for i in range(len(vals)): val = vals[i] time, concentration = load_data(val) plot(val, time, concentration, ax, i)plt.xlabel("time (s)")plt.ylabel("concentration (#molecules)")plt.title("Phosphorylated CheY vs time")ax.minorticks_on()ax.grid(b = True, which = 'minor', axis = 'both', color = 'lightgrey', linewidth = 0.5, linestyle = ':')ax.grid(b = True, which = 'major', axis = 'both', color = 'grey', linewidth = 0.8 , linestyle = ':')plt.show()Now run the notebook. How do changing values of k_add impact the CheY-P concentrations? Why do you think this is?In the main text, we will examine the results of our plots and discuss how they can be used to infer the cell’s behavior in the presence of increasing attractant.Return to main text"
} ,
{
"title" : "Software Tutorial: Using Homology Modeling to Predict the Structure of the SARS-CoV-2 Spike Protein",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_homology",
"date" : "",
"content" : "In this software tutorial, we will apply three popular software resources (SWISS-MODEL, Robetta, and GalaxyWEB) that use homology modeling to predict the structure of the SARS-CoV-2 spike protein. Recall from the main text that this protein is a homotrimer, meaning that it consists of three identical protein structures called chains. In what follows, we will predict the sequence of a single chain.The details of how the three software resources presented in this lesson differ are beyond the scope of our work in this course. If you are interested in understanding how they each implement homology modeling, then we suggest that you consult the documentation of the appropriate resource.SWISS-MODELTo run SWISS-MODEL, first download the sequence of the spike protein chain: SARS-CoV-2 spike protein chain.Next, go to the main SWISS-MODEL website and click Start Modelling.On the next page, copy and paste the sequence into the Target Sequence(s): box. Name your project and enter an email address to get a notification of when your results are ready. Finally, click Build Model to submit the job request. Note that you do not need to specify that you want to use the SARS-CoV spike protein as a template because the software will automatically search for a template for you.Your results may take between an hour and a day to finish depending on how busy the server is. (In the meantime, feel free to run the remaining software.) When you receive an email notification, follow the link provided and you can download the final models.When we ran our own job, SWISS-MODEL did indeed use one of the PDB entries of SARS-CoV spike protein as its template (PDB: 6crx) and correctly recognized that the template was a homotrimer. As a result, the software predicted a complete spike protein with all three chains included. An image of our results can be seen below. You can also download our results. We will discuss how to interpret these results and the .pdb file format when we return to the main text.Structures of the three models of this protein reported by SWISS-MODEL. The superimposed structure of all three models is shown on the bottom right.RobettaRobetta is a publicly available software resource that uses the same software as the distributed Rosetta@home project that we mentioned earlier in this module. As with SWISS-MODEL, we will provide Robetta a single chain of the SARS-CoV-2 spike protein.First, if you have not already done so, download the sequence of the chain: SARS-CoV-2 spike chain sequence.Next, visit Robetta and register for an account.Then, click Structure Prediction > Submit.Create a name for the job, i.e. “SARS-CoV-2 Spike Chain”. Copy and paste the downloaded sequence into the Protein sequence box. Check CM only (for homology modeling), complete the arithmetic problem provided to prove you are human, and then click Submit.You should receive an email notification with a link to results after between an hour and a day. In our own run, unlike SWISS-MODEL, Robetta did not deduce that the input protein was a trimer and only predicted a single chain. The structure of the results from our own run of Robetta are shown in the figure below. You can also download our results if you like.The homology models produced by Robetta of one of the chains of the SARS-CoV-2 spike protein. The superimposition of all structures is shown on the bottom right.GalaxyWEBGalaxyWEB is a server with many available services for protein study, including protein structure prediction. GalaxyTBM (the template-based modeling service) uses HHsearch to identify up to 20 templates, and then matches the core sequence with the templates using PROMALS3D. Next, models are generated using MODELLERCSA.Because GalaxyWEB has a sequence limit of 1000 amino acids, we cannot use the entire spike protein chain. Instead, we will model the receptor binding domain (RBD) of the spike protein, which we introduced in the main text as a variable domain within the spike protein’s S1 subunit.First, download the sequence of the RBD.Then, visit the GalaxyWEB homepage. At the top, click Services > TBM.Enter a job name, i.e. SARS-CoV-2 RBD. Enter an email address and then copy and paste the RBD sequence into the SEQUENCE box. Finally, click Submit.You should receive an email notification within a day with a link to your results. The results of our run of GalaxyWEB along with the validated structure of the SARS-CoV-2 RBD (PDB entry: 6lzg) are visualized in the figure below. You can also download our results if you like.Homology models predicted by GalaxyWEB for the SARS-CoV-2 spike protein RBD. The superimposition of all these structures is shown on the bottom right.Interpreting the results of our software runsIn the figures above, the structures predicted by the three software resources appear to be reasonably accurate. But throughout this course, we have prioritized using quantitative metrics to analyze results. As we return to the main text, our question is how to develop a quantitative metric to compare the results of these models to each other and to the structure of the SARS-CoV-2 spike protein.Return to main text"
} ,
{
"title" : "Software Tutorial: Binarizing Images",
"category" : "",
"tags" : "",
"url" : "/white_blood_cells/tutorial_image_binarization",
"date" : "",
"content" : "Step 2 Image Binarization and File ConversionAs mentioned in Step 1, we would ideally have black and white segmented images as a result of running the above commands. However, we would like to ensure that our segmented images are in black and white, and if not (suppose they are in greyscale), we would like to convert them into this form. Furthermore, the CellOrganizer PCA method requires all images to be in TIFF format, so this step handles that file conversion as well. To go an extra step, we also want another set of images that show the segmented nuclei in color while the background is in black.In this step of the pipeline, we open up MATLAB for running the binarization and file conversion code.Open MATLAB. Then, run the following commands in the MATLAB command window:> clear> clc> cd ~/Desktop/WBC_PCAPipeline/Step2_Binarization> WBC_imgBinAs a result, the BWImgs_1 directory will now contain binarized TIFF versions of the segmented images. That is, each greyscale image resulting from the nuclear segmentation step with have pixel values strictly of 0, which is black, or 1, which is white.Our other result is that the ColNuc_1 directory will now contain TIFF versions of the segmented images where the nuclei is in color and the background is in black. We won’t be using these images further along the pipeline, but they are useful to look at for visual confirmation that the majority of the nucleus is being considered for the PCA model.Nuclear segmentation of BloodImage_00001.jpg in black and white.Nuclear segmentation of BloodImage_00001.jpg with color retained in the nucleus."
} ,
{
"title" : "Software Tutorial: Training a Classifier on an Image Shape Space",
"category" : "",
"tags" : "",
"url" : "/white_blood_cells/tutorial_image_classification",
"date" : "",
"content" : "Step 5 ClassificationTo convert our current PCA pipeline coordinates to a format to be used in Weka, we need to convert our WBC_PCA.csv file into the arff format used by Weka.Method 1: Weka for File ConversionOpen Weka and navigate to Tools --> ArffViewer.Then navigate to File --> Open.Change the Files of Type option to CSV data files.Find the WBC_PCA.csv file in your Step4_ShapeSpaceVisualization folder and click Open.Once all the data is loaded on screen, navigate to File --> Save as ….Locate the Step5_Classification folder, remove the .csv extension in the File Name field, and click Save.As a result, our PCA pipeline coordinates have now been converted to the file format that Weka accepts for further classification. This file should be saved as WBC_PCA.arff in the Step5_Classification subfolder of the WBC_CellClass folder."
} ,
{
"title" : "Software Tutorial: Getting Started with BioNetGen and Modeling Ligand-Receptor Dynamics",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/tutorial_lr",
"date" : "",
"content" : "This collection of tutorials will gradually build up from scratch a chemotaxis simulation using BioNetGen.In this tutorial, we will: set up BioNetGen; explore several key aspects of BioNetGen modeling: rules, species, simulation method, and parameters use BioNetGen to model ligand-receptor dynamics and compute a steady-state concentration of ligands and receptors.What is BioNetGen?BioNetGen is a software application for specification and simulation of rule-based modeling. In past modules, we have worked with chemical reactions that can be thought of as rules (e.g., “whenever an X particle and a Y particle collide, replace them with a single X particle”). The chemotaxis pathway also can be thought of as a set of biochemical rules specifying a set of mathematical equations dictating molecule concentrations. Our larger goal is to use BioNetGen to translate these rules into a reasonable chemotaxis simulation, then visualize and interpret the results.In this tutorial, we will focus only on modeling ligand-receptor dynamics, which we will use as a starting point for more advanced modeling later.Installation and setupRuleBender is the graphical interface for BioNetGen. Please download the version corresponding to your operating system. Here is a step-by-step installation guide.Starting with Ligand-Receptor DynamicsIn this tutorial, we will build our model from scratch. If you like instead, you can download the completed simulation file here:ligand_receptor.bnglIn our system, there are only two types of molecules: the ligand (L), and the receptor (T). (The receptor is in fact a receptor complex because it is attached to additional molecules, which we will elaborate on later). The ligand can bind to the receptor, forming an intermediate, and the complex can also dissociate. We write this reaction as L + T <-> L.T, where the formation of the intermediate is called the forward reaction, and the dissociation is called the reverse reaction.In our system, which starts with a quantity of free ligands and receptors, the numbers of these free molecules should drop quickly, because free ligands and free receptors are readily to meet each other. After a while, there will be more L.T in the system and therefore more dissociation; at the same time, because free L and T are less abundant, less binding happens. The system will gradually reach a steady-state where the rate of L and T binding equilibrates with L.T dissociation.We will simulate reaching this steady state, which means that we will need to know the following two parameters: The rate of the forward reaction: k_lr_bind [L][T], where k_lr_bind is the rate constant. The rate of the reverse reaction: k_lr_dis[L.T], where k_lr_dis is the rate constant.Equilibrium is reached when k_lr_bind [L][T] = k_lr_dis[L.T]. Our goal in this tutorial is to use BioNetGen to determine this equilibrium in molecule concentrations as a proof of concept.First, open RuleBender and select File > New BioNetGen Project.Save your file as ligand_receptor.BioNetGenl. Now you should be able to start coding on line 1.Specifying molecule typesWe will specify everything needed for this tutorial, but if you are interested, reference BioNetGen documentation can be found here.To specify our model, add begin model and end model. Everything below regarding the specification of the model will go between these two lines.We first add ligand and receptor molecules to our model under a molecule types section. Recall from the main text that we will call these molecules L(t) and T(l).begin modelbegin molecule types L(t) T(l)end molecule typesend modelSpecifying reaction rules and observablesAs discussed in the main text, the ligand-receptor simulation will only need to apply a single bi-directional reaction.begin reaction rules LR: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_disend reaction rulesOnce we have specified reactions, we can define the molecules whose concentrations we are interested in tracking. These molecules are added to an observables section.begin observables Molecules free_ligand L(t) Molecules bound_ligand L(t!l).T(l!l) Molecules free_receptor T(l)end observablesInitializing unbound molecule countsNext, we need to specify a variable indicating the number of molecules with which we would like to initialize our simulation. We place these molecules within a seed species section. We are putting L0 unbound L molecules, and T0 unbound T molecules at the beginning; we will set these parameters later.Note that we do not specify an initial number of bound L.T complexes, meaning that the initial concentration of these complexes will be equal to zero.begin seed species L(t) L0 T(l) T0end seed speciesSpecifying parametersNow we will declare all the parameters we introduced in the above sections. We will start with setting L0, the initial concentration of ligand, to 10,000, and T0, the initial concentration of receptors, to 7000. It remains to set the reaction rates for the forward and reverse reactions.BioNetGen is unitless, but for simplicity, we will assume that all concentrations are measured in the number of molecules per cell. The reaction rates are conventionally thought of in the units of mole (M) per second, where 1 M denotes Avogrado’s number, which is approximately 6.02 · 1023.Because of the differing units of molecules per cell and mole per second, we need to do some unit conversion here. The volume of an E. coli cell is approximately 1µm3, and so 1 mole per liter will correspond to 1 mole per 1015 µm3, or 6.02 · 108 molecules per cell.For bimolecular reactions, the rate constant should have unit M-1s-1, and we divide with NaV to convert to (molecules/µm3)-1)s-1. For monomolecular reactions, the rate constant have unit s-1, so no unit conversion is required.Although the specific numbers of cellular components vary among each bacterium, the components in chemotaxis pathway follows a relatively constant ratio. For all the simulations in this tutorial, we assign the initial number for each molecule and reaction rate by first deciding a reasonable range based on in vivo quantities 123. Our parameters are summarized below.begin parameters NaV 6.02e8 #Unit conversion M -> #/µm^3 L0 1e4 #number of ligand molecules T0 7000 #number of receptor complexes k_lr_bind 8.8e6/NaV #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociationend parametersNote: The parameters section has to appear before the reaction rules section.If you save your file, then you should see a “contact map” in the upper right corner of the window indicating the potential bonding of L and T. This contact map is currently very simplistic, but for more complicated simulations it can help visualize the interaction of species in the system.Specifying simulation commandsWe are now ready to run our simulation. At the bottom of the model specification (i.e., after end model), we will add a generate_network and simulate command. The simulate command will take three parameters, which we specify below.Method. We will use method=>"ssa" throughout these tutorials, which indicate that we are using the SSA (Gillespie) algorithm that was described in the main text. BioNetGen also includes the parameters method=>"nf" (network-free) and method=>"ode"(ordinary differential equations) that you can try. See the following article for more details if you are interested in these two approaches: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5079481.Time span.t_end, the simulation duration. BioNetGen simulation time is unitless; for simplicity, we assume our time unit is the second.Number of Steps. n_steps tells the program how many time points to break the simulation into when measuring the concentrations of our observables. generate_network({overwrite=>1}) simulate({method=>"ssa", t_end=>1, n_steps=>100})The following code contains our complete simulation, which you can also download here:ligand_receptor.bngl.begin modelbegin molecule types L(t) T(l)end molecule typesbegin observables Molecules free_ligand L(t) Molecules bound_ligand L(t!l).T(l!l) Molecules free_receptor T(l)end observablesbegin parameters NaV2 6.02e8 #Unit conversion to cellular concentration M/L -> #/um^3 L0 1e4 #number of ligand molecules T0 7000 #number of receptor complexes k_lr_bind 8.8e6/NaV2 #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociationend parametersbegin reaction rules LR: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_disend reaction rulesbegin seed species L(t) L0 T(l) T0end seed speciesend modelgenerate_network({overwrite=>1})simulate({method=>"ssa", t_end=>1, n_steps=>100})STOP: Based on our results from calculating steady-state concentration by hand in the main text, predict how the concentrations will change and what the equilibrium concentrations will be.Running our simulationWe are now ready to run our simulation. To do so, visit Simulation at the right side of the contact map and click Run. You can then visualize the results of the simulation, showing changes in concentration over time. These results are also stored as a .gdat file in the folder result/your time of simulation.Is the result you obtain what you expected? In the main text, we will return to this question and then learn more about the details of bacterial chemotaxis in order to expand our BioNetGen model into one that fully reflects these details.Return to main text Li M, Hazelbauer GL. 2004. Cellular stoichiometry of the components of the chemotaxis signaling complex. Journal of Bacteriology. Available online ↩ Spiro PA, Parkinson JS, and Othmer H. 1997. A model of excitation and adaptation in bacterial chemotaxis. Biochemistry 94:7263-7268. Available online. ↩ Stock J, Lukat GS. 1991. Intracellular signal transduction networks. Annual Review of Biophysics and Biophysical Chemistry. Available online ↩ "
} ,
{
"title" : "Software Tutorial: Finding Local Differences in the SARS-CoV and SARS-CoV-2 Spike Protein Structures",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_multiseq",
"date" : "",
"content" : "In this tutorial, we will get started with VMD and then calculate Qres between the SARS-CoV-2 RBD (PDB entry: 6vw1) and SARS-CoV RBD (PDB entry: 2ajf) using the VMD plugin Multiseq. By locating regions with low Qres, we can hopefully identify regions of structural differences between the two RBDs.Multiseq aligns two protein structures using a tool called Structural Alignment of Multiple Proteins (STAMP). Much like the Kabsch algorithm considered in part 1 of the module, STAMP minimizes the distance between alpha carbons of the aligned residues for each protein or molecule by applying rotations and translations. If the structures do not have common structures, then STAMP will fail. For more details on the algorithm used by STAMP, click here.Getting startedFor this tutorial, first download VMD. Throughout this tutorial, the program may prompt you to download additional protein database information, which you should accept.We will need to download the .pdb files for 6vw1 and 2ajf. Visit the 6vw1 and 2ajf PDB pages. For each protein, click Download Files and select PDB Format. The following figure shows this for 6vw1.Aligning the RBD regions of two spike proteinsNext, launch VMD, which will open three windows. We will not use VMD.exe, the console window, in this tutorial. We will load molecules and change visualizations in VMD Main. Finally, we will use OpenGL Display to display our visualizations.We will first load the SARS-CoV-2 RBD (6vw1) into VMD. In VMD Main, go to File > New Molecule. Click Browse, select your downloaded file (6vw1.pdb) and click Load.The molecule should now be listed in VMD Main, with its visualization in OpenGL Display.In the OpenGL Display window, you can click and drag the molecule to change its orientation. Pressing ‘r’ on your keyboard allows you to rotate the molecule, pressing ‘t’ allows you to translate the molecule, and pressing ‘s’ allows you to enlarge or shrink the molecule (or you can use your mouse’s scroll wheel). Note that left click and right click have different actions.We now will need to load the SARS-CoV RBD (2ajf). Repeat the above steps for 2ajf.pdb.After both molecules are loaded into VMD, start up Multiseq by clicking on Extensions > Analysis > Multiseq.You will see all the chains listed out per file. Both PDB files contain two biological assemblies of the structure. The first is made up of Chain A (ACE2) and Chain E (RBD), and the second is Chain B (ACE2) and Chain F (RBD). Because Chain A is identical to Chain B, and Chain E is identical to Chain F, we only need to work with one assembly. (We will use the second assembly.)Because we only want to compare the RBD, we will only keep chain F of each structure. To remove the other chains, select the chain and click Edit > Cut.Click Tools > Stamp Structural Alignment, and a new window will open up.Keep all the values and click OK; once you have done so, the RBD regions will have been aligned.Visualizing a structural alignmentNow that we have aligned the two RBD regions, we would like to compare their Qres values over the entire RBD. To see a coloring of the protein alignment based on Qres, click View > Coloring > Qres.Blue indicates a high value of Qres, meaning that the protein structures are similar at this position; red indicates low Qres and dissimilar protein structures.The OpenGL Display window will now color the superimposed structures according to the value of Qres.We are looking for regions of consecutive amino acids having low Qres, which correspond to locations in which the coronavirus RBDs differ structurally. You may like to explore the alignments yourself to look for regions of interest before we head back to the main text and discuss our results.Return to main text"
} ,
{
"title" : "Software Tutorial: Segmenting Nuclei from Cellular Images",
"category" : "",
"tags" : "",
"url" : "/white_blood_cells/tutorial_nuclear_segmentation",
"date" : "",
"content" : "Cell Organizer TutorialFor the following steps, a variety of software and programs are used to go through our PCA Pipeline for the Kaggle White Blood Cells dataset. Please ensure that the following pre – requisite applications have been installed before continuing. Required Applications Terminal Command to Check Version Python (v. 3.7.3) python –version R (v. 3.5.2) R –version MATLAB (2016b or newer) - CellOrganizer (v. 2.8.x) - RStudio - Furthermore, we ask that you download our WBC_PCAPipeline folder onto your desktop and verify that it has the following contents:WBC_PCAPipeline Data RawImgs BloodImage_00001.jpg · · · BloodImage_00410.jpg WBC_Labels.csv Step1_Segmentation WBC_imgSeg.R Step2_Binarization WBC_imgBin.m Step3_PCAModel WBC_PCAModel.m Step4_ShapeSpaceVisualization WBC_SS_CellClass.py WBC_SS_CellType.py Step5_Classification README.pdfNote: Please ensure the WBC_PCAPipeline file is onto your desktop. Otherwise, you will have to manually change all file paths to point to the appropriate folders on your computer. While unconventional, we’ve noticed occasional software glitches with using setwd() or pwd() otherwise.Now open a terminal window and navigate into your WBC_PCAPipeline directory by running the following command:> cd ~/Desktop/WBC_PCAPipelineStep 1 Nuclear SegmentationFor this dataset, we would like to identify the white blood cell types by the nuclear shape since that particular feature is easy to verify with the naked eye. Moreover, the nucleus of the white blood cell(s) in each image are of a distinctly darker color than the rest of the red blood cells or platelets in the image. This allows us to implement a technique called thresholding. In thresholding, we examine each pixel in the image and reset the color value of the image according to our thresholds. If the original RGB values for the pixel are above the thresholds we set in each channel, then we reset the pixel value to white. All other pixels below the thresholds will be set to black (ideally). This way, our target, the white blood cell nucleus, is a white blob in a black background.Open RStudio, and navigate to File --> Open File, and find Desktop/WBC_PCAPipeline/Step1_Segmentation/WBC_imgSeg.R.Should you be asked in the console about upgrading dependencies during the EBImage library installation, type in a and hit enter.Note: If you source the file multiple times, three directories are created each time within the Data folder with the form of SegImgs_i, ColNuc_i, and BWImgs_i, where i is an integer. The images are only segmented into the most recently created directories (those with the largest value of i). Should you run into trouble and need to run this file multiple times, ensure that future file paths are pointing to the right folders!After we have sourced our R file, you’ll notice the creation of three directories of the form: SegImgs_1, ColNuc_1, and BWImgs_1 within the Data folder.Assuming the file ran correctly, the first directory, SegImgs_1, contains all of the segmented nuclei images where the white blood cell nucleus is in white and the rest of the image is seemingly in black. The second directory, ColNuc_1, should be empty, but will eventually contain all of the segmented nuclei images; however, the white blood cell nucleus will be in its original color and the rest of the image will be in black. Finally, the third directory, BWImgs_1, should be empty, but will eventually hold binarized versions (strictly black and white) of the images in SegImgs_1.Nuclear Segmentation Example using BloodImage_00001.jpg.Greyscale segmented nucleus from the above image.Return to main text"
} ,
{
"title" : "Software Tutorial: Adding Phosphorylation to our BioNetGen Model",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/tutorial_phos",
"date" : "",
"content" : "In this tutorial, we will extend the BioNetGen model covered in the ligand-receptor tutorial to add the phosphorylation chemotaxis mechanisms described in the main text, shown in the figure reproduced below.To get started, create a copy of your file from the ligand-receptor tutorial and save it as phosphorylation.bngl. If you would rather not follow along below, you can download a completed BioNetGen file here:phosphorylation.bnglDefining moleculesFirst, we introduce a state into the receptor and CheY particles to mark whether they are is phosphorylated or not. The notation Phos~U~P indicates we introduce phosphorylation states to a molecule (U indicates unphosphorylated, and P indicates phosphorylated). We also add molecule CheY(Phos~U~P) and CheZ().Note: Be careful with the use of spaces; don’t put spaces after the comma in the specification of the receptor.)begin molecule types L(t) #ligand molecule T(l,Phos~U~P) #receptor complex CheY(Phos~U~P) CheZ()end molecule typesDuring this simulation, we are interested in tracking the concentration of phosphorylated CheY and CheA (receptor complex) along with the concentration of the ligand.begin observables Molecules phosphorylated_CheY CheY(Phos~P) Molecules phosphorylated_CheA T(Phos~P) Molecules bound_ligand L(t!1).T(l!1)end observablesDefining reactionsNow we are ready to update our reaction rules to include phosphorylation and dephosphorylation in addition to the ligand-receptor reaction. These rules were discussed in the main text and are reproduced below.begin reaction rules LigandReceptor: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_dis #Free vs. ligand-bound complexes autophosphorylates FreeTP: T(l,Phos~U) -> T(l,Phos~P) k_T_phos BoundTP: L(t!1).T(l!1,Phos~U) -> L(t!1).T(l!1,Phos~P) k_T_phos*0.2 YP: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phos YDep: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephosend reaction rulesInitializing molecules and parametersTo initialize our simulation, we need to indicate the number of molecules in each state present at the beginning of the simulation. Since we are adding ligands at the beginning of the simulation, the initial amount of molecules at each same state should be equal to the equilibrium concentrations when no ligand is present. To this end, we set the amount of phosphorylated receptor equal to one-fourth the concentration of unphosphorylated receptor, and the concentration of phosphorylated CheY to be equal to the concentration of unphosphorylated CheY.Note: This was validated through trial and error.begin seed species L(t) L0 T(l,Phos~U) T0*0.8 T(l,Phos~P) T0*0.2 CheY(Phos~U) CheY0*0.5 CheY(Phos~P) CheY0*0.5 CheZ() CheZ0end seed speciesWe now set initial quantities of molecules along with reaction rate parameters to be consistent with in vivo quantities 123.begin parameters NaV 6.02e8 #Unit conversion to cellular concentration M/L -> #/um^3 L0 0 #number of ligand molecules T0 7000 #number of receptor complexes CheY0 20000 CheZ0 6000 k_lr_bind 8.8e6/NaV #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociation k_T_phos 15 #receptor complex autophosphorylation k_Y_phos 3.8e6/NaV #receptor complex phosphorylates CheY k_Y_dephos 8.6e5/NaV #Z dephosphoryaltes CheYend parametersNote: The parameters section has to appear before the reaction rules section.Place everything occurring above between begin model and end model tags.Simulating responses to attractantsBefore running the simulation, let’s think about what will happen. If we don’t add any ligand molecule into the system, then assuming that we have started the simulation at steady-state, then the concentrations of phosphorylated receptors and CheY will remain at equilibrium.We can now run the simulation, setting t_end equal to 3 in order to run the simulation for longer than we did in the ligand-receptor tutorial. Place the following code after end model in your BioNetGen file.generate_network({overwrite=>1})simulate({method=>"ssa", t_end=>3, n_steps=>100})The following code contains our complete simulation, which you can also download here:phosphorylation.bnglbegin modelbegin molecule types L(t) #ligand molecule T(l,Phos~U~P) #receptor complex CheY(Phos~U~P) CheZ()end molecule typesbegin observables Molecules phosphorylated_CheY CheY(Phos~P) Molecules phosphorylated_CheA T(Phos~P) Molecules bound_ligand L(t!1).T(l!1)end observablesbegin parameters NaV2 6.02e8 #Unit conversion to cellular concentration M/L -> #/um^3 L0 5e3 #number of ligand molecules T0 7000 #number of receptor complexes CheY0 20000 CheZ0 6000 k_lr_bind 8.8e6/NaV2 #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociation k_T_phos 15 #receptor complex autophosphorylation k_Y_phos 3.8e6/NaV2 #receptor complex phosphorylates Y k_Y_dephos 8.6e5/NaV2 #Z dephosphoryaltes Yend parametersbegin reaction rules LR: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_dis #Free vs. ligand-bound receptor complexes autophosphorylates at different rates FreeTP: T(l,Phos~U) -> T(l,Phos~P) k_T_phos BoundTP: L(t!1).T(l!1,Phos~U) -> L(t!1).T(l!1,Phos~P) k_T_phos*0.2 YP: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phos YDep: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephosend reaction rulesbegin seed species L(t) 0 T(l,Phos~U) T0*0.8 T(l,Phos~P) T0*0.2 CheY(Phos~U) CheY0*0.5 CheY(Phos~P) CheY0*0.5 CheZ() CheZ0end seed speciesend modelgenerate_network({overwrite=>1})simulate({method=>"ssa", t_end=>3, n_steps=>100})Now save your file and run the simulation by clicking Run under Simulate. What do you observe?When we add ligand molecules into the system, as we did in the tutorial for ligand-receptor dynamics, the concentration of bound receptors should increase. What will happen to the concentration of phosphorylated CheA, and phosphorylated CheY? What will happen to steady state concentrations?Now run your simulation by changing L0 to be equal to 5000 and then run it again with L0 to be equal to 1e5. Do the results confirm your hypothesis? What happens as we keep changing L0? What happens as L0 gets really large (e.g., 1e9)? What do you think is going on?In the main text, we will explore the results of the above simulation. We will then interpret how differences in the amounts of initial ligand can influence changes in the concentration of phosphorylated CheY (and therefore the bacterium’s tumbling frequency).Return to main text Li M, Hazelbauer GL. 2004. Cellular stoichimetry of the components of the chemotaxis signaling complex. Journal of Bacteriology. Available online ↩ Spiro PA, Parkinson JS, and Othmer H. 1997. A model of excitation and adaptation in bacterial chemotaxis. Biochemistry 94:7263-7268. Available online. ↩ Stock J, Lukat GS. 1991. Intracellular signal transduction networks. Annual Review of Biophysics and Biophysical Chemistry. Available online ↩ "
} ,
{
"title" : "Software Tutorial: Modeling a Pure Random Walk Strategy",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/tutorial_purerandom",
"date" : "",
"content" : "In this tutorial, we will simulate a random walk and take a look at how well this allows a bacterium to reach a goal. You might not anticipate that the random walk will do a very good job of this – and you would not be wrong – but it will give us a baseline simple strategy to compare against a more advanced random walk strategy.Specifically, we will build a Jupyter notebook to do so. You can create a blank file called chemotaxis_std_random.ipynb and type along, but the notebook will be quite lengthy, so feel free to download the final notebook here if you like: chemotaxis_std_random.ipynb. A detailed explanation of the model and each function can be found in this completed file as well as the tutorial below.Make sure that the following dependencies are installed: Installation Link Version Check install/version Python3 3.6+ python --version Jupyter Notebook 4.4.0+ jupyter --version Numpy 1.14.5+ pip list | grep numpy Matplotlib 3.0+ pip list | grep matplotlib Colorspace or with pip any pip list | grep colorspace Converting a run-and-tumble model to a random walk simulationOur model will be based on observations from our BioNetGen simulation and known biology of E. coli. We summarize this simulation, discussed in the main text, as follows. Run. The duration of a cell’s run follows an exponential distribution with mean equal to the background run duration time_exp. Tumble. The duration of a cell’s tumble follows an exponential distribution with mean 0.1s1. When it tumbles, we assume it only changes its orientation for the next run but doesn’t move in space. The degree of reorientation is a random number sampled uniformly between 0° and 360°. Gradient. We model an exponential gradient with a goal (1500, 1500) having a concentration of 108. All cells start at the origin (0, 0), which has a concentration of 102. The ligand concentration at a point (x, y) is given by L(x, y) = 100 · 108 · (1-d/D), where d is the distance from (x, y) to the goal, and D is the distance from the origin to the goal; in this case, D is 1500√2 ≈ 2121 µm.First, we will import all packages needed.import numpy as npimport matplotlib.pyplot as pltimport mathfrom matplotlib import colorsfrom matplotlib import patchesimport colorspaceNext, we specify all the model parameters: mean tumble time: 0.1s; cell speed of 20µm/s2.We also set a “seed” of our pseudorandom number generator to ensure that the sequence of “random” numbers given to us by Python will be the same every time we run the simulation. To obtain a different outcome, change the seed.Note: For more on seeding, please consult the discussion of pseudorandom number generation at Programming for Lovers.SEED = 128 #Any random seednp.random.seed(SEED) #set seed for Numpy random number generator#Constants for E.coli tumblingtumble_time_mu = 0.1 #second#E.coli movement constantsspeed = 20 #um/s, speed of E.coli movement#Model constantsstart = [0, 0] #All cells start at [0, 0]ligand_center = [1500, 1500] #Position of highest concentrationcenter_exponent, start_exponent = 8, 2 #exponent for concentration at [1500, 1500] and [0, 0]origin_to_center = 0 #Distance from start to center, intialized here, will be actually calculated latersaturation_conc = 10 ** 8 #From BNG modelWe now will have two functions that will establish the ligand concentration at a given point (x, y) as equal to L(x, y) = 100 · 108 · (1-d/D).First, we introduce a function to compute the distance between two points in two-dimensional space.# Calculates distance between point a and b# Input: positions a, b. Each in the form array [x, y]# Returns the distance, a float.def distance(a, b): return math.sqrt((a[0] - b[0]) ** 2 + (a[1] - b[1]) ** 2)Next, we define a function to determine the concentration of ligand at a given position according to our formula, which will use distance as a subroutine.# Calculates the concentration of a given position# Exponential gradient, the exponent follows a linear relationship with distance to center# Input: position pos, [x, y]# Returns the concentration, a float.def calc_concentration(pos): dist = distance(pos, ligand_center) exponent = (1 - dist / origin_to_center) * (center_exponent - start_exponent) + start_exponent return 10 ** exponentThe following tumble_move function chooses a direction of movement as a uniform random number between 0 and 2π radians. As noted previously, the duration of a cell’s tumble follows an exponential distribution with mean equal to 0.1s.# Samples the new direction and time of a tumble# Calculates projection on the Horizontal and Vertical direction for the next move# No input# Return the horizontal movement projection (float), the vertical one (float), tumble time (float)def tumble_move(): #Sample the new direction unformly from 0 to 2pi, record as a float new_dir = np.random.uniform(low = 0.0, high = 2 * math.pi) projection_h = math.cos(new_dir) #displacement projected on Horizontal direction for next run, float projection_v = math.sin(new_dir) #displacement projected on Vertical direction for next run, float #Length of the tumbling sampled from exponential distribution with mean=0.1, float tumble_time = np.random.exponential(tumble_time_mu) return projection_h, projection_v, tumble_timeIn a given run of the simulation, we keep track of the total time t, and we only continue our simulation if t < duration, where duration is a parameter indicating how long to run the simulation. If t < duration, then we apply the following steps to a given cell. Sample the run duration curr_run_time from an exponential distribution with mean time_exp; run for curr_run_time seconds in the current direction; sample the duration of tumble tumble_time; determine the new direction of the simulated bacterium by calling the tumble_move function discussed above; increment t by curr_run_time and tumble_time.These steps are achieved by the simulate_std_random function below, which takes the number of cells num_cells to simulate, the time to run each simulation for duration, and the mean time of a single run time_exp. This function stores the trajectories of these cells in a variable named path.# This function performs simulation# Input: number of cells to simulate (int), how many seconds (int), the expected run time before tumble (float)# Return: the simulated trajectories path: array of shape (num_cells, duration+1, 2)def simulate_std_random(num_cells, duration, run_time_expected): #Takes the shape (num_cells, duration+1, 2) #any point [x,y] on the simulated trajectories can be accessed via path[cell, time] path = np.zeros((num_cells, duration + 1, 2)) for rep in range(num_cells): # Initialize simulation t = 0 #record the time elapse curr_position = np.array(start) #start at [0, 0] curr_direction, projection_h, projection_v, tumble_time = tumble_move() #Initialize direction randomly past_sec = 0 while t < duration: #run curr_run_time = np.random.exponential(run_time_expected) #get run duration, float #displacement on either direction is calculated as the projection * speed * time #update current position by summing old position and displacement curr_position = curr_position + np.array([projection_h, projection_v]) * speed * curr_run_time #tumble curr_direction, projection_h, projection_v, tumble_time = tumble_move() #increment time t += (curr_run_time + tumble_time) #record position approximate for integer number of second curr_sec = int(t) for sec in range(past_sec, min(curr_sec, duration) + 1): #fill values from last time point to current time point path[rep, sec] = curr_position.copy() past_sec= curr_sec return pathNow that we have established parameters and written the functions that we will need, we will run our simulation with num_cells equal to 3 and duration equal to 500 to get a rough idea of what the trajectories of our simulated cells will look like.#Run simulation for 3 cells with different background tumbling frequencies, Plot pathduration = 800 #seconds, duration of the simulation, intnum_cells = 3 #number of cells, intorigin_to_center = distance(start, ligand_center) #Update the global constantrun_time_expected = 1.0 #expected run time before tumble, float#Calls the simulate functionpath = simulate_std_random(num_cells, duration, run_time_expected) #get the simulated trajectoriesprint(path[:,-1,:]) #print the terminal poistion of each simulationVisualizing simulated cell trajectoriesNow that we have generated the data of our randomly walking cells, our next step is to plot these trajectories using Matplotlib. We will color-code the background ligand concentration. The ligand concentrations at each position (a, b) where a and b are both integers can be represented using a matrix, and we take the logarithm of each value of this matrix to better color our exponential gradient. That is, a value of 108 will be converted to 8, and a value of 104 will be converted to 4. A white background color will indicate a low ligand concentration, while red indicates high concentration.#Below are all for plotting purposes#Initialize the plot with 1*1 subplot of size 8*8fig, ax = plt.subplots(1, 1, figsize = (8, 8))#First set color map to color-code the concentrationmycolor = [[256, 256, 256], [256, 255, 254], [256, 253, 250], [256, 250, 240], [255, 236, 209], [255, 218, 185], [251, 196, 171], [248, 173, 157], [244, 151, 142], [240, 128, 128]] #RGB values, from coolors:)for i in mycolor: for j in range(len(i)): i[j] *= (1/256) #normalize to 0~1 rangecmap_color = colors.LinearSegmentedColormap.from_list('my_list', mycolor) #Linearly segment these colors to create a continuous color map#Store the concentrations for each integer position in a matrixconc_matrix = np.zeros((4000, 4000)) #we will display from [-1000, -1000] to [3000, 3000]for i in range(4000): for j in range(4000): conc_matrix[i][j] = math.log(calc_concentration([i - 1000, j - 1000])) #calculate the exponents of concentrations at each location#Simulate the gradient distribution, plot as a heatmapax.imshow(conc_matrix.T, cmap=cmap_color, interpolation='nearest', extent = [-1000, 3000, -1000, 3000], origin = 'lower')Next, we plot each cell’s trajectory over each of its tumbling points. To visualize older vs. newer time points, we set the color as a function of t so that newer points have lighter colors.#Plot simulation resultstime_frac = 1.0 / duration#Plot the trajectories. Time progress: dark -> colorfulfor t in range(duration): ax.plot(path[0,t,0], path[0,t,1], 'o', markersize = 1, color = (0.2 * time_frac * t, 0.85 * time_frac * t, 0.8 * time_frac * t)) ax.plot(path[1,t,0], path[1,t,1], 'o', markersize = 1, color = (0.85 * time_frac * t, 0.2 * time_frac * t, 0.9 * time_frac * t)) ax.plot(path[2,t,0], path[2,t,1], 'o', markersize = 1, color = (0.4 * time_frac * t, 0.85 * time_frac * t, 0.1 * time_frac * t))ax.plot(start[0], start[1], 'ko', markersize = 8) #Mark the starting point [0, 0]for i in range(num_cells): ax.plot(path[i,-1,0], path[i,-1,1], 'ro', markersize = 8) #Mark the terminal points for each cellWe mark the starting point of each cell’s trajectory with a black dot and the ending point of the trajectory with a red dot. We place a blue cross over the goal. Finally, we set axis limits, assign axis labels, and generate the plot.ax.plot(1500, 1500, 'bX', markersize = 8) #Mark the highest concentration point [1500, 1500]ax.set_title("Pure random walk \n Background: avg tumble every {} s".format(run_time_exped), x = 0.5, y = 0.87)ax.set_xlim(-1000, 3000)ax.set_ylim(-1000, 3000)ax.set_xlabel("poisiton in um")ax.set_ylabel("poisiton in um")plt.show()STOP: Run the notebook. What do you observe? Are the cells moving up the gradient? Is this a good strategy for a bacterium to use to search for food?Quantifying the performance of our search algorithmWe already know from our work in previous modules that a random walk simulation can produce very different outcomes. In order to assess the performance of the random walk algorithm, we will simulate num_cells = 500 cells and duration = 1500 seconds.Visualizing the trajectories for this many cells will be messy. Instead, we will measure the distance between each cell and the target at the end of the simulation, and then take the average and standard deviation of this value over all cells.#Run simulation for 500 cells, plot average distance to highest concentration pointduration = 1500 #seconds, duration of the simulationnum_cells = 500 #number of cells, intorigin_to_center = distance(start, ligand_center) #Update the global constantorigin_to_center = distance(start, ligand_center) #Update the global constantrun_time_expected = 1.0 #expected run time before tumble, floatall_distance = np.zeros((num_cells, duration)) #Initialize to store results, array with shape (num_cells, duration)paths = simulate_std_random(num_cells, duration, run_time_expected) #run simulationfor cell in range(num_cells): for time in range(duration): pos = paths[cell, time] #get the position [x,y] for the cell at a given time dist = distance(ligand_center, pos) #calculate the Euclidean distance between that position to [1500, 1500] all_distance[cell, time] = dist #record this distance# For all time, take average and standard deviation over all cells.all_dist_avg = np.mean(all_distance, axis = 0) #Calculate average over cells, array of shape (duration,)all_dist_std = np.std(all_distance, axis = 0) #Calculate the standard deviation, array of shape (duration,)We will then plot the average and standard deviation of the distance to goal using the plot and fill_between functions.#Below are all for plotting purposes#Define the colors to usecolors1 = colorspace.qualitative_hcl(h=[0, 300.], c = 60, l = 70, pallete = "dynamic")(1)xs = np.arange(0, duration) #Set the x-axis for plot: time points. Array of integers of shape (duration,)fig, ax = plt.subplots(1, 1, figsize = (10, 8)) #Initialize the plot with 1*1 subplot of size 10*8mu, sig = all_dist_avg, all_dist_std#Plot average distance vs. timeax.plot(xs, mu, lw=2, label="pure random walk, back ground tumble every {} second".format(run_time_expected), color=colors1[0])#Fill in average +/- one standard deviation vs. timeax.fill_between(xs, mu + sig, mu - sig, color = colors1, alpha=0.15)ax.set_title("Average distance to highest concentration")ax.set_xlabel('time (s)')ax.set_ylabel('distance to center (µm)')ax.hlines(0, 0, duration, colors='gray', linestyles='dashed', label='concentration 10^8')ax.legend(loc='upper right')ax.grid()STOP: Before visualizing the average distances at each time step, what do you expect the average distance to the goal to be?Now, run the notebook. The colored line indicates average distance of the 500 cells; the shaded area is standard deviation; and the grey dashed line corresponds to a maximum ligand concentration of 108.As mentioned, you may not be surprised that this simple random walk strategy is not very effective at finding the goal. Not to worry: in the main text, we discuss how to adapt this strategy into one that better reflects how E. coli explores its environment based on what we have learned in this module about chemotaxis.Return to main text Saragosti J., Siberzan P., Buguin A. 2012. Modeling E. coli tumbles by rotational diffusion. Implications for chemotaxis. PLoS One 7(4):e35412. available online. ↩ Baker MD, Wolanin PM, Stock JB. 2005. Signal transduction in bacterial chemotaxis. BioEssays 28:9-22. Available online ↩ "
} ,
{
"title" : "Software Tutorial: Traveling Down an Attractant Gradient",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/tutorial_removal",
"date" : "",
"content" : "In the previous tutorial, we simulated the behavior of a bacterium moving up the concentration gradient. In this tutorial, we will simulate the opposite - when the bacterium is not in luck and moves down a concentration gradient.To get started, create a copy of your adaptation.bngl file from the adaptation tutorial and save it as removal.bngl. If you would rather not follow along below, you can download a completed BioNetGen file here: removal.bngl.We also will build a Jupyter notebook in this tutorial for plotting the concentrations of different particles over time. To do so, you should save a copy of your plotter_up.ipynb file called plotter_down.ipynb; if you would rather not follow along, we provide a completed notebook here: plotter_down.ipynbBefore running this notebook, make sure the following dependencies are installed. Installation Link Version Check install/version Python3 3.6+ python --version Jupyter Notebook 4.4.0+ jupyter --version Numpy 1.14.5+ pip list \| grep numpy Matplotlib 3.0+ pip list \| grep matplotlib Colorspace or with pip any pip list \| grep colorspace Modeling a decreasing ligand gradient with a BioNetGen functionWe have simulated how the concentration of phosphorylated CheY changes when the cell moves up the attractant gradient. The concentration dips, but over time, methylation states change so that they can compensate for the increased ligand-receptor binding and restore the equilibrium of phosphorylated CheY. What if instead ligands are removed, as we would see if the bacterium is traveling down an attractant gradient? We might imagine that we would see an increase in phosphorylated CheY to increase tumbling and change course, followed by a return to steady-state. But is this what we will see?To simulate the cell traveling down an attractant gradient, we will add a kill reaction removing unbound ligand at a constant rate. To do so, we will add the following rule within the reaction rules section.#Simulate ligand removalLigandGone: L(t) -> 0 k_goneIn the parameters section, we start by defining k_gone to be 0.3, so that d[L]/dt = -0.3[L]. The solution of this differential equation is [L] = 107e-0.3t. We will also change the initial ligand concentration (L0) to be 1e7. Thus, the concentration of ligand becomes so low that ligand-receptor binding reaches 0 within 50 seconds.k_gone 0.3L0 1e7We will set the initial concentrations of all seed species to be the final steady-state concentrations from the result for our adaptation.bngl model, and see if after reducing the concentration of unbound ligand gradually, the simulation can restore these concentrations to steady-state.First, visit the adaptation.bngl model and add the concentration for each combination of methylation state and ligand binding state of the receptor complex to the observables section. Then run this simulation with L0 equal to 1e7.When the simulation is finished, visit RuleBender-workspace/PROJECT_NAME/results/adaptation/ and find the simulation result at the final time point.When the model finishes running, input the final concentrations of molecules to the seed species section of our removal.bngl model. Here is what we have.begin seed species @EC:L(t) L0 @PM:T(l!1,r,Meth~A,Phos~U).L(t!1) 1190 @PM:T(l!1,r,Meth~B,Phos~U).L(t!1) 2304 @PM:T(l!1,r,Meth~C,Phos~U).L(t!1) 2946 @PM:T(l!1,r,Meth~A,Phos~P).L(t!1) 2 @PM:T(l!1,r,Meth~B,Phos~P).L(t!1) 156 @PM:T(l!1,r,Meth~C,Phos~P).L(t!1) 402 @CP:CheY(Phos~U) CheY0*0.71 @CP:CheY(Phos~P) CheY0*0.29 @CP:CheZ() CheZ0 @CP:CheB(Phos~U) CheB0*0.62 @CP:CheB(Phos~P) CheB0*0.38 @CP:CheR(t) CheR0end seed speciesRunning the BioNetGen modelWe are now ready to run our BioNetGen model. To do so, first add the following after end model to run our simulation over 1800 seconds.generate_network({overwrite=>1})simulate({method=>"ssa", t_end=>1800, n_steps=>1800})The following code contains our complete simulation, which can also be downloaded here: removal.bngl.begin modelbegin molecule types L(t) T(l,r,Meth~A~B~C,Phos~U~P) CheY(Phos~U~P) CheZ() CheB(Phos~U~P) CheR(t)end molecule typesbegin observables Molecules bound_ligand L(t!1).T(l!1) Molecules phosphorylated_CheY CheY(Phos~P) Molecules low_methyl_receptor T(Meth~A) Molecules medium_methyl_receptor T(Meth~B) Molecules high_methyl_receptor T(Meth~C) Molecules phosphorylated_CheB CheB(Phos~P)end observablesbegin parameters NaV2 6.02e8 #Unit conversion to cellular concentration M/L -> #/um^3 miu 1e-6 L0 1e7 T0 7000 CheY0 20000 CheZ0 6000 CheR0 120 CheB0 250 k_lr_bind 8.8e6/NaV2 #ligand-receptor binding k_lr_dis 35 #ligand-receptor dissociation k_TaUnbound_phos 7.5 #receptor complex autophosphorylation k_Y_phos 3.8e6/NaV2 #receptor complex phosphorylates Y k_Y_dephos 8.6e5/NaV2 #Z dephosphoryaltes Y k_TR_bind 2e7/NaV2 #Receptor-CheR binding k_TR_dis 1 #Receptor-CheR dissociation k_TaR_meth 0.08 #CheR methylates receptor complex k_B_phos 1e5/NaV2 #CheB phosphorylation by receptor complex k_B_dephos 0.17 #CheB autodephosphorylation k_Tb_demeth 5e4/NaV2 #CheB demethylates receptor complex k_Tc_demeth 2e4/NaV2 #CheB demethylates receptor complex k_gone 0.3end parametersbegin reaction rules LigandReceptor: L(t) + T(l) <-> L(t!1).T(l!1) k_lr_bind, k_lr_dis #Receptor complex (specifically CheA) autophosphorylation #Rate dependent on methylation and binding states #Also on free vs. bound with ligand TaUnboundP: T(l,Meth~A,Phos~U) -> T(l,Meth~A,Phos~P) k_TaUnbound_phos TbUnboundP: T(l,Meth~B,Phos~U) -> T(l,Meth~B,Phos~P) k_TaUnbound_phos*1.1 TcUnboundP: T(l,Meth~C,Phos~U) -> T(l,Meth~C,Phos~P) k_TaUnbound_phos*2.8 TaLigandP: L(t!1).T(l!1,Meth~A,Phos~U) -> L(t!1).T(l!1,Meth~A,Phos~P) 0 TbLigandP: L(t!1).T(l!1,Meth~B,Phos~U) -> L(t!1).T(l!1,Meth~B,Phos~P) k_TaUnbound_phos*0.8 TcLigandP: L(t!1).T(l!1,Meth~C,Phos~U) -> L(t!1).T(l!1,Meth~C,Phos~P) k_TaUnbound_phos*1.6 #CheY phosphorylation by T and dephosphorylation by CheZ YPhos: T(Phos~P) + CheY(Phos~U) -> T(Phos~U) + CheY(Phos~P) k_Y_phos YDephos: CheZ() + CheY(Phos~P) -> CheZ() + CheY(Phos~U) k_Y_dephos #CheR binds to and methylates receptor complex #Rate dependent on methylation states and ligand binding TRBind: T(r) + CheR(t) <-> T(r!2).CheR(t!2) k_TR_bind, k_TR_dis TaRUnboundMeth: T(r!2,l,Meth~A).CheR(t!2) -> T(r,l,Meth~B) + CheR(t) k_TaR_meth TbRUnboundMeth: T(r!2,l,Meth~B).CheR(t!2) -> T(r,l,Meth~C) + CheR(t) k_TaR_meth*0.1 TaRLigandMeth: T(r!2,l!1,Meth~A).L(t!1).CheR(t!2) -> T(r,l!1,Meth~B).L(t!1) + CheR(t) k_TaR_meth*30 TbRLigandMeth: T(r!2,l!1,Meth~B).L(t!1).CheR(t!2) -> T(r,l!1,Meth~C).L(t!1) + CheR(t) k_TaR_meth*3 #CheB is phosphorylated by receptor complex, and autodephosphorylates CheBphos: T(Phos~P) + CheB(Phos~U) -> T(Phos~U) + CheB(Phos~P) k_B_phos CheBdephos: CheB(Phos~P) -> CheB(Phos~U) k_B_dephos #CheB demethylates receptor complex #Rate dependent on methyaltion states TbDemeth: T(Meth~B) + CheB(Phos~P) -> T(Meth~A) + CheB(Phos~P) k_Tb_demeth TcDemeth: T(Meth~C) + CheB(Phos~P) -> T(Meth~B) + CheB(Phos~P) k_Tc_demeth #Simulate ligand removal LigandGone: L(t) -> 0 k_goneend reaction rulesbegin compartments EC 3 100 #um^3 PM 2 1 EC #um^2 CP 3 1 PM #um^3end compartmentsbegin seed species @EC:L(t) L0 @PM:T(l!1,r,Meth~A,Phos~U).L(t!1) 1190 @PM:T(l!1,r,Meth~B,Phos~U).L(t!1) 2304 @PM:T(l!1,r,Meth~C,Phos~U).L(t!1) 2946 @PM:T(l!1,r,Meth~A,Phos~P).L(t!1) 2 @PM:T(l!1,r,Meth~B,Phos~P).L(t!1) 156 @PM:T(l!1,r,Meth~C,Phos~P).L(t!1) 402 @CP:CheY(Phos~U) CheY0*0.71 @CP:CheY(Phos~P) CheY0*0.29 @CP:CheZ() CheZ0 @CP:CheB(Phos~U) CheB0*0.62 @CP:CheB(Phos~P) CheB0*0.38 @CP:CheR(t) CheR0end seed speciesend modelgenerate_network({overwrite=>1})simulate({method=>"ssa", t_end=>1800, n_steps=>1800})Save your file, and then visit simulation and click Run. What happens to the concentration of phosphorylated CheY? Are the concentrations of complexes at different methylation states restored to their levels before adding ligands to the adaptation.bngl model?As we did in the tutorial simulating increasing ligand, we can try different values for k_gone. Change t_end in the simulate method to 1800 seconds, and run the simulation with k_gone equal to 0.01, 0.03, 0.05, 0.1, and 0.5.All simulation results are stored in the RuleBender-workspace/PROJECT_NAME/results/MODEL_NAME/TIME/ directory in your computer. Rename the directory with the k_gone values instead of the timestamp for simplicity.Visualizing the results of our simulationWe will use the jupyter notebook plotter_up.ipynb as a template for the plotter_down.ipynb file that we will use to visualize our results. First, we will specify the directories, model name, species of interest, and reaction rates. Put the RuleBender-workspace/PROJECT_NAME/results/MODEL_NAME/ folder inside the same directory as the Jupyter notebook or change the model_path accordingly.model_path = "removal" #The folder containing the modelmodel_name = "removal" #Name of the modeltarget = "phosphorylated_CheY" #Target moleculevals = [0.01, 0.03, 0.05, 0.1, 0.3, 0.5] #Gradients of interestThe second code block is the same as that provided in the previous tutorial. This code loads the simulation result at each time point from the .gdat file, which stores the concentration of all observables at all steps. It then plots the concentration of phosphorylated CheY over time.import numpy as npimport sysimport osimport matplotlib.pyplot as pltimport colorspace#Define the colors to usecolors = colorspace.qualitative_hcl(h=[0, 300.], c = 60, l = 70, pallete = "dynamic")(len(vals))def load_data(val): file_path = os.path.join(model_path, str(val), model_name + ".gdat") with open(file_path) as f: first_line = f.readline() #Read the first line cols = first_line.split()[1:] #Get the col names (species names) ind = 0 while cols[ind] != target: ind += 1 #Get col number of target molecule data = np.loadtxt(file_path) #Load the file time = data[:, 0] #Time points concentration = data[:, ind] #Concentrations return time, concentrationdef plot(val, time, concentration, ax, i): legend = "k = - " + str(val) ax.plot(time, concentration, label = legend, color = colors[i]) ax.legend() returnfig, ax = plt.subplots(1, 1, figsize = (10, 8))for i in range(len(vals)): val = vals[i] time, concentration = load_data(val) plot(val, time, concentration, ax, i)plt.xlabel("time")plt.ylabel("concentration (#molecules)")plt.title("Active CheY vs time")ax.minorticks_on()ax.grid(b = True, which = 'minor', axis = 'both', color = 'lightgrey', linewidth = 0.5, linestyle = ':')ax.grid(b = True, which = 'major', axis = 'both', color = 'grey', linewidth = 0.8 , linestyle = ':')plt.show()Run the notebook. How does the value of k_gone impact the concentration of phosphorylated CheY? Why? Are the tumbling frequencies restored to the background frequency? As we return to the main text, we will show the resulting plots and discuss these questions.Return to main text"
} ,
{
"title" : "Software Tutorial: Using RMSD to Compare the Predicted SARS-CoV-2 Spike Protein Against its Experimentally Validated Structure",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_rmsd",
"date" : "",
"content" : "In this tutorial, we will demonstrate how to apply the Kabsch algorithm to compute the RMSD between two protein structures. In particular, we will show how to compare the experimentally validated structure of the SARS-CoV-2 spike protein (PDB entry: 6vxx) against one of our resulting from homology modeling. You should then feel empowered to run this comparison on our other spike protein predictions, as well as compare our ab initio prediction of human hemoglobin subunit alpha against its validated structure (PDB entry: 1si4).Below is a folder containing all the models that we produced so far along with the experimentally validated structures. Please consult the included README.txt to see which PDB structure to use for comparison against each predicted structure.Download modelsGetting startedThis tutorial will be our first encounter with ProDy, our featured software resource in this module. ProDy is an open-source Python package that allows users to perform protein structural dynamics analysis. Its flexibility allows users to select specific parts or atoms of the structure for conducting normal mode analysis and structure comparison. If you’re not interested in following along, you can download a Jupyter notebook along with all files needed to run this tutorial below.Download completed tutorialTo get started, make sure that you have the following software resources installed.Python (2.7, 3.5, or later)ProDyNumPyBiopythonIPythonMatplotlibWe recommend that you create a workspace (directory) for storing created files when using ProDy or storing protein .pdb files. Open your computer’s terminal app and navigate to this directory using the cd (if you are new to using a command line interface, please consult this introduction.). Then, start IPython using the following command.ipython --pylabFirst, import needed packages and turn interactive mode on (you only need to do this once per session).In[#]: from pylab import *In[#]: from prody import *In[#]: ion()Calculating RMSD of two chainsWe will first compute RMSD for a single chain of the spike protein homotrimer. Because we are dealing with the entire spike protein, we will need to “match” chains that have the greatest paired similarity between our prediction and the result.The first built-in ProDy function that we will use, called parsePDB, parses a protein structure in .pdb format. To use our own protein structure, make sure that the .pdb file is in the current directory. Let’s parse in one of our models we obtained from homology modeling of the SARS-CoV-2 Spike protein, SWISS1. You can use your own SARS-CoV-2 Spike protein model that you generated. In this tutorial, our model will be called swiss1.pdb.In[#]: struct1 = parsePDB(‘swiss1.pdb’)Because we want to find out how well swiss1.pdb performed, we will compare it to the determined protein structure of SARS-CoV-2 Spike protein in the Protein Data Bank. Enter the code shown below. Because the .pdb extension is missing, this command will prompt the console to search for 6vxx, the SARS-CoV-2 Spike protein, from the Protein Data Bank and download the .pdb file into the current directory. It will then save the structure as the variable struct2.In[#]: struct2 = parsePDB(‘6vxx’)With the protein structures parsed, we can now match chains. To do so, we use a built-in function matchChains with a sequence identity threshold of 75% and an overlap threshold of 80% is specified (the default is 90% for both parameters). The following function call stores the result in a 2D array called matches. matches[i] denotes the i-th match found between two chains that are stored as matches[i][0] and matches[i][1].In[#]: matches = matchChains(struct1, struct2, seqid = 75, overlap = 80)We will now define our own function that will print matched chains.In[#]: def printMatch(match):...: print('Chain 1 : {}'.format(match[0]))...: print('Chain 2 : {}'.format(match[1]))...: print('Length : {}'.format(len(match[0])))...: print('Seq identity: {}'.format(match[2]))...: print('Seq overlap : {}'.format(match[3]))...: print('RMSD : {}\n'.format(calcRMSD(match[0], match[1])))...:Let’s call our new function printmatch on our previous variable matches.In[#]: for match in matches:…: printMatch(match)…:You should see the results printed out as follows.For example, matches[0][0] corresponds to Chain 1 : AtomMap Chain A from swiss1 -> Chain A from 6vxx and matches[5][1] corresponds to Chain 2: AtomMap Chain C from 6vxx -> Chain B from swiss1.Say that we want to calculate the RMSD score between the matched Chain B from swiss1 and Chain B from 6vxx. This will correspond to matches[4][0] and matches[4][1]. After accessing these two structures, we need to to apply the Kabsch algorithm to superimpose and rotate the two structures so that they are as similar as possible, which we do with the built-in function calcTransformation.In[#]: first_ca = matches[4][0]In[#]: second_ca = matches[4][1]In[#]: calcTransformation(first_ca, second_ca).apply(first_ca);Now that the best rotation has been found, we can determine the RMSD between the structures using the built-in function calcRMSD.In[#]: calcRMSD(first_ca, second_ca)You should now see something like the following:Merging multiple chains to compute RMSD of an overall structureNow that we can compare the structures of two chains, it is also possible to merge the chains and calculate the RMSD of the overall homotrimer structure. Below, we merge the three matches corresponding to matching the A chains, B chains, and C chains of the two proteins, and we then compute the RMSD of the resulting structures.In[#]: first_ca = matches[0][0] + matches[4][0] + matches[8][0]In[#]: second_ca = matches [0][1] + matches[4][1] + matches[8][1]In[#]: calcTransformation(first_ca, second_ca).apply(first_ca);In[#]: calcRMSD(first_ca, second_ca)Your results should look like the following:We will leave the RMSD computation for the other models we produced as an exercise.STOP: Apply what you have learned in this tutorial to compute the RMSD between the SARS-CoV-2 spike protein and every one of our predicted homology models, as well as between human hemoglobin subunit alpha and its ab initio model. Download the predicted models here; you should consult the included readme for reference.We are now ready to head back to the main text, where we will discuss the RMSD calculations for all models. Were we successful in predicting the structure of the SARS-CoV-2 spike protein?Return to main text"
} ,
{
"title" : "Software Tutorial: Generalizing and Visualizing an Image Shape Space",
"category" : "",
"tags" : "",
"url" : "/white_blood_cells/tutorial_shape_space",
"date" : "",
"content" : "Step 4 Shape Space VisualizationHaving generated a PCA model from the images and completed the post – processing in Step 3, we are now ready to visualize our results!In this step of the pipeline, we will use Python to visualize the computed shape space by label. The goal is to see clusters of cells within the same region of the same type. We can classify our white blood cells by two different parameters, cell type or cell family. If we are classifying by cell family, then we are attempting to classify images into the three classes of granulocytes, lymphocytes, and monocytes. We can also classify by cell type, in which case granulocytes subdivide out into neutrophils, eosinophils, and basophils, so that we are dividing the data into five classes.Classification by cell familyFirst, we will classify images by cell family. Open a new terminal window and run the following commands:> cd ~/Desktop/WBC_PCAPipeline/Step4_Visualization> python WBC_CellFamily.pyAs a result, you can click, drag, and rotate the graphical space to see the clusters of cell classes by color (a legend can be found in the upper right - hand corner). Furthermore, an image file of this visualization is saved within the current directory under WBC_ShapeSpace_CF.png.Classification by cell typeNow we will classify images by cell type. Open a new terminal window and run the following commands.> cd ~/Desktop/WBC_PCAPipeline/Step4_Visualization> python WBC_CellType.pyAs a result, you can click, drag, and rotate the graphical space to see the clusters of cell classes by color (a legend can be found in the upper right - hand corner). Furthermore, an image file of this visualization is saved within the current directory under WBC_ShapeSpace_CT.png."
} ,
{
"title" : "Software Tutorial: Comparing different chemotaxis default tumbling frequencies",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/tutorial_tumbling_frequencies",
"date" : "",
"content" : "In this tutorial, we will run a comparison of the chemotactic random walk over a variety of different background tumbling frequencies. Are some frequencies better than others at helping the bacterium reach the goal?Qualitative comparison of different background tumbling frequenciesFirst, we will use chemotaxis_walk.ipynb from our modified random walk tutorial to compare the trajectories of a few cells for different tumbling frequencies.Specifically, we will run our simulation for three cells over a time period of 800 seconds. We simulate each cell multiple times using a variety of different tumbling frequencies. (We use average tumbling frequencies of 0.1, 0.2, 0.5, 1.0, 2.0, 5.0, and 10.0 seconds.) This will give us a rough idea of what the trajectories look like.duration = 800 #seconds, duration of the simulationnum_cells = 3origin_to_center = euclidean_distance(start, ligand_center) #Update the global constantrun_time_expected_all = [0.5, 1.0, 5.0]paths = np.zeros((len(run_time_expected_all), num_cells, duration + 1, 2))for i in range(len(run_time_expected_all)): run_time_expected = run_time_expected_all[i] paths[i] = simulate_chemotaxis(num_cells, duration, run_time_expected)As we did previously, we then plot the trajectories.conc_matrix = np.zeros((3500, 3500))for i in range(3500): for j in range(3500): conc_matrix[i][j] = math.log(calc_concentration([i - 500, j - 500]))mycolor = [[256, 256, 256], [256, 255, 254], [256, 253, 250], [256, 250, 240], [255, 236, 209], [255, 218, 185], [251, 196, 171], [248, 173, 157], [244, 151, 142], [240, 128, 128]] #from coolors:)for i in mycolor: for j in range(len(i)): i[j] *= (1/256)cmap_color = colors.LinearSegmentedColormap.from_list('my_list', mycolor)for freq_i in range(len(run_time_expected_all)): fig, ax = plt.subplots(1, figsize = (8, 8)) ax.imshow(conc_matrix.T, cmap=cmap_color, interpolation='nearest', extent = [-500, 3000, -500, 3000], origin = 'lower') #Plot simulation results time_frac = 1.0 / duration #Time progress: dark -> colorful for t in range(duration): ax.plot(paths[freq_i,0,t,0], paths[freq_i,0,t,1], 'o', markersize = 1, color = (0.2 * time_frac * t, 0.85 * time_frac * t, 0.8 * time_frac * t)) ax.plot(paths[freq_i,1,t,0], paths[freq_i,1,t,1], 'o', markersize = 1, color = (0.85 * time_frac * t, 0.2 * time_frac * t, 0.9 * time_frac * t)) ax.plot(paths[freq_i,2,t,0], paths[freq_i,2,t,1], 'o', markersize = 1, color = (0.4 * time_frac * t, 0.85 * time_frac * t, 0.1 * time_frac * t)) ax.plot(start[0], start[1], 'ko', markersize = 8) ax.plot(1500, 1500, 'bX', markersize = 8) for i in range(num_cells): ax.plot(paths[freq_i,i,-1,0], paths[freq_i,i,-1,1], 'ro', markersize = 8) ax.set_title("Background tumbling freq:\n tumble every {} s".format(run_time_expected_all[freq_i]), x = 0.5, y = 0.9, fontsize = 12) ax.set_xlim(-500, 3000) ax.set_ylim(-500, 3000) ax.set_xlabel("poisiton in μm") ax.set_ylabel("poisiton in μm")plt.show()STOP: Run the code blocks for simulating the random walks and plotting the outcome. Are the cells moving up the gradient? How do the shapes of the trajectories differ for different tumbling frequencies? What value of the average tumbling frequency do you think is best?Comparing tumbling frequencies over many cellsWe will now scale up our simulation to num_cells = 500 cells. To rigorously compare the results of the simulation for different default tumbling frequencies, we will calculate the average distance to the center at each time step for each tumbling frequency that we use.#Run simulation for 500 cells with different background tumbling frequencies, Plot average distance to highest concentration pointduration = 1500 #seconds, duration of the simulation#num_cells = 500num_cells = 300run_time_expected_all = [0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0]origin_to_center = euclidean_distance(start, ligand_center) #Update the global constantall_distance = np.zeros((len(time_exp), num_cells, duration)) #Initialize to store resultspaths = np.zeros((len(run_time_expected_all), num_cells, duration + 1, 2))for i in range(len(run_time_expected_all)): run_time_expected = run_time_expected_all[i] paths[i] = simulate_chemotaxis(num_cells, duration, run_time_expected)for freq_i in range(len(run_time_expected_all)): for c in range(num_cells): for t in range(duration): pos = paths[freq_i, c, t] dist = euclidean_distance(ligand_center, pos) all_distance[freq_i, c, t] = distall_dist_avg = np.mean(all_distance, axis = 1)all_dist_std = np.std(all_distance, axis = 1)print(all_dist_avg[0][-10:])We then plot the average distance to the goal over time for each frequency, where each tumbling frequency is assigned a different color.#Below are all for plotting purposes#Define the colors to usecolors1 = colorspace.qualitative_hcl(h=[0, 300.], c = 60, l = 70, pallete = "dynamic")(len(time_exp))xs = np.arange(0, duration)fig, ax = plt.subplots(1, figsize = (10, 8))for freq_i in range(len(time_exp)): mu, sig = all_dist_avg[freq_i], all_dist_std[freq_i] ax.plot(xs, mu, lw=2, label="tumble every {} second".format(run_time_expected_all[freq_i]), color=colors1[freq_i]) ax.fill_between(xs, mu + sig, mu - sig, color = colors1[freq_i], alpha=0.1)ax.set_title("Average distance to highest concentration")ax.set_xlabel('time (s)')ax.set_ylabel('distance to center (µm)')ax.legend(loc='lower left', ncol = 1)ax.grid()STOP: Run the code blocks we have provided, simulating the random walks and plotting the average distance to the goal over time for each tumbling frequency. Is there any difference in the performance of the search algorithm for different tumbling frequencies? For each frequency, how long does it take the cell to “reach” the goal? And can we conclude that one tumbling frequency is better than the others?As we return to the main text, we interpret the results of this final tutorial. It turns out that there are significant differences in our chemotaxis algorithm’s ability to find (and remain at) the goal for differing default tumbling frequencies. It hopefully will not surprise you to learn that the frequency that evolution has bestowed upon E. coli turns out to be optimal.Return to main text"
} ,
{
"title" : "Software Tutorial: Visualizing Specific Regions of Interest within the Spike Protein Structure",
"category" : "",
"tags" : "",
"url" : "/coronavirus/tutorial_visualization",
"date" : "",
"content" : "In this tutorial, we will discuss how to visualize a protein structure and highlight specific amino acids of interest in the protein. We will focus on the region that we identified in the previous tutorial starting at around position 475 of the SARS-CoV-2 RBD, where we found that this RBD differs structurally from that of SARS-CoV.We will visualize the site in the SARS-CoV-2 RBD using the SARS-CoV-2 chimeric RBD complexed with the human ACE2 enzyme (PDB entry: 6vw1). Before completing this tutorial, you should have installed VMD and know how to load molecules into the program. If you need a refresher, visit the previous tutorial.First, download the chimeric RBD .pdb file and load it into VMD.To change the visualization of the protein, click Graphics > Representation. Double clicking on a representation will enable/disable it.The file 6vw1.pdb contains two biological assemblies of the complex. The first assembly contains Chain A (taken from ACE2) and chain E (taken from RBD), and the second assembly contains chain B (ACE2) and chain F (RBD). We will focus on the second assembly.We will first add chain B, from the ACE2 enzyme, and color it green. Selected Atoms allows you to select specific parts of the molecule. The keyword all selects all atoms in the file, and so replace all with chain B. Then, click Apply. (In general, to choose a specific chain, use the expression chain X, where X is the chain of interest. To choose a specific residue (i.e., amino acid), use the keyword resid #. Expressions can be combined using the keywords and and or, and more complicated selections need parentheses. Coloring Method allows you to change the coloring method of the selected atoms. This includes coloring based on element, amino acid residue, and many more. To choose a specific color, select ColorID. A drop-down list will appear to color selection. Choose “7” to select green. Drawing Method allows you to change the visualization of the selected atoms. Lines (also known as wireframe) draws a line between atoms to represent bonds. Tube focuses only on the backbone of the molecule. Licorice will show both the backbone and the side chains of the protein. Cartoon/NewCartoon will show the secondary structure of the molecule (protein). We are interesed mostly in the backbone, and so we will choose Tube. At this point, your OpenGL Display window should look like the following:We next add chain F, from the SARS-CoV-2 chimeric RBD, and color it purple. Click Create Rep, which will duplicate the previous representation. Then, change Selected Atoms to chain F and ColoringID to “11” to color the chain purple. Make sure your other selections are as follows:You should now see two distinct colored structures!We can also change our visualization to target specific amino acids by creating another representation and specifying the amino acid with the keyword resid followed by the position of this amino acid residue.For example, say that we are interested in residue 486 in the RBD (which is phenylalanine). Click Create Rep. In the new representation, change Selected Atoms to chain F and resid 486 and click Apply. Then change the Coloring Method to ColorID and 4. Finally, change the Drawing Method to Licorice.In the OpenGL Display window, you will now see a new yellow projection coming out of the RBD, as shown in the image below. This is residue 486! You may need to rotate the protein a bit to see it. (Instructions on how to rotate a molecule and zoom in and out within VMD were given in our tutorial on finding local protein differences.)Let’s now color a few more residues from our region of interest: residues 475 and 487 of the RBD, and residues 19, 24, 79, 82, and 83 of ACE2. As we return to the main text, we will explain why these residues are implicated in binding affinity.Coloring these residues is analogous to the previous steps of just adding new representations and changing Selected Atoms, Coloring Method, and Drawing Method. Make the following representations; note the colors that we use.Your final visualization should look like the following figure.Congratulations! You have now created a detailed visualization of the RBD-ACE2 complex that focuses on our site of interest. As we return to the main text, we will discuss how the highlighted amino acids help increase the binding affinity of the SARS-CoV-2 spike protein to ACE2.STOP: Create another visualization of the same site using the SARS-CoV RBD complex with ACE2 (PDB entry: 2ajf). How does it compare with your first visualization of SARS-CoV-2 complex? Use the graphical representations shown in the table below.SARS-CoV RBD with ACE2 (2ajf) Protein Style Color Selection SARS-CoV RBD Tube ColorID 11 chain F SARS-CoV RBD Licorice ColorID 4 chain F and resid 472 ACE2 Licorice ColorID 6 chain B and (resid 82 or resid 79 or resid 83) ACE2 Licorice ColorID 10 chain B and resid 19 ACE2 Licorice ColorID 3 chain B and resid 24 Return to main text"
} ,
{
"title" : "Chemotactic random walk",
"category" : "",
"tags" : "",
"url" : "/chemotaxis/tutorial_walk",
"date" : "",
"content" : "In a previous tutorial, we built a Jupyter notebook to simulate the movement of a cell moving randomly throughout two-dimensional space in a sequence of steps. At each step, the next direction of the cell’s movement is chosen completely randomly. We called this simple algorithm “strategy 1” in the main text.In this tutorial, we will adapt this simulation into one that attempts to more closely mimic the real behavior of E. coli chemotaxis. We will then be able to compare the results of these two algorithms back in the main text.Simulation files and dependenciesDownload the simulation and visualization here: chemotaxis_compare.ipynb. Detailed explanation of the model and each functions can be found in the file too.Make sure the following dependencies are installed: Installation Link Version Check install/version Python3 3.6+ python --version Jupyter Notebook 4.4.0+ jupyter --version Numpy 1.14.5+ pip list | grep numpy Matplotlib 3.0+ pip list | grep matplotlib Colorspace or with pip any pip list | grep colorspace Modeling chemotactic walk at a cellular levelOur model will be based on observations from BNG simulation and E. coli biology.Ingredients and simplifying assumptions of the model: Run. The background average duration of each run (time_exp) is a variable of interest. When the cell senses concentration change, the cell changes the expected run duration (exp_run_time). The duration of each run follows an exponential distribution with mean = exp_run_time. Tumble. The duration of cell tumble follows an exponential distribution with mean 0.1s1. When it tumbles, we assume it only changes the orientation for the next run but doesn’t move in space. The degree of reorientation follows uniform distribution from 0° to 360°. Response. As we’ve seen in the BNG model, the cell can respond to the gradient change within 0.5 seconds. In this model, we allow cells to re-measure the concentration after it runs for 0.5 seconds. Gradient. We model an exponential gradient centered at [1500, 1500] with a concentration of 1010. All cells start at [0, 0], which has a concentration of 102. The receptors saturate at a concentration of 1010. Performance. The closer to the center of the gradient the better.What’s different between this more advanced model and the earlier model using a standard random walk strategy is we decide the run duration before tumbling based on current vs. past concentrations.We will start from our previous model and only modify on including this ability.Updating run time before tumbling based on concentrationsIn our standard random walk model, we sampled each run duration from an exponential distribution with mean run_time_expected. This time, we will include the concentrations into our sampling.The updated run durations adjusted with concentration changes follow an exponential distribution with mean of run_time_expected_adj_conc. When no gradient is present, run_time_expected_adj_conc = run_time_expected. When there is a change in ligand concentration, run_time_expected_adj_conc changes accordingly. The change is calculated as (curr_conc - past_conc) / past_conc to normalize for the exponential gradient. We model this response with run_time_expected_adj_conc = run_time_expected + 10 * change.# Calculate the wait time for next tumbling event# Input: current concentration (float), past concentration (float), position (array [x, y]), expected run time (float)# Return: duration of current run (float)def run_duration(curr_conc, past_conc, position, run_time_expected): curr_conc = min(curr_conc, saturation_conc) #Can't detect higher concentration if receptors saturates past_conc = min(past_conc, saturation_conc) change = (curr_conc - past_conc) / past_conc #proportion change in concentration, float run_time_expected_adj_conc = run_time_expected * (1 + 10 * change) #adjust based on concentration change, float if run_time_expected_adj_conc < 0.000001: run_time_expected_adj_conc = 0.000001 #positive wait times elif run_time_expected_adj_conc > 4 * run_time_expected: run_time_expected_adj_conc = 4 * run_time_expected #the decrease to tumbling frequency is only to a certain extent #Sample the duration of current run from exponential distribution, mean=run_time_expected_adj_conc curr_run_time = np.random.exponential(run_time_expected_adj_conc) return curr_run_timeFor each cell, simulate through time as the following procedure:while t < duration: Assess the current concentration Update current run duration curr_run_time If curr_run_time < 0.5s: 1. run for curr_run_time second along current direction 2. Sample the duration of tumble tumble_time and the resulted direction 3. increment t by curr_run_time and tumble_time If curr_run_time > 0.5s: 1. run for 0.5s along current direction 2. increment t by 0.5s (and then the cell will re-assess the new concentration, and decide the duration of next run)We need to modify our code in the following way: Add concentration calculation before each run. Replace our sampler for run duration with calling the run_duration function Check the conditions with the sampled duration; only tumble after run if curr_run_time < 0.5s# This function performs simulation# Input: number of cells to simulate (int), how many seconds (int), the expected run time before tumble (float)# Return: the simulated trajectories paths: array of shape (num_cells, duration+1, 2)def simulate_chemotaxis(num_cells, duration, run_time_expected): #Takes the shape (num_cells, duration+1, 2) #any point [x,y] on the simulated trajectories can be accessed via paths[cell, time] paths = np.zeros((num_cells, duration + 1, 2)) for rep in range(num_cells): # Initialize simulation t = 0 #record the time elapse curr_position = np.array(start) #start at [0, 0] past_conc = calc_concentration(start) #Initialize concentration projection_h, projection_v, tumble_time = tumble_move() #Initialize direction randomly while t < duration: curr_conc = calc_concentration(curr_position) curr_run_time = run_duration(curr_conc, past_conc, curr_position, run_time_expected) #get run duration, float # if run time (r) is within the step (s), run for r second and then tumble if curr_run_time < response_time: #displacement on either direction is calculated as the projection * speed * time #update current position by summing old position and displacement curr_position = curr_position + np.array([projection_h, projection_v]) * speed * curr_run_time projection_h, projection_v, tumble_time = tumble_move() #tumble t += (curr_run_time + tumble_time) #increment time # if r > s, run for r; then it will be in the next iteration else: #displacement on either direction is calculated as the projection * speed * time #update current position by summing old position and displacement curr_position = curr_position + np.array([projection_h, projection_v]) * speed * response_time t += response_time #no tumble here #record position approximate for integer number of second curr_sec = int(t) if curr_sec <= duration: #fill values from last time point to current time point paths[rep, curr_sec] = curr_position.copy() past_conc = curr_conc return pathsCompare performance of the two strategiesPlease download the simulation and visualization here: chemotaxis_compre.ipynb.To compare the performance of the two strategies, we visualize the trajectories of simulation with 3 cells and compare the performance using simulation with 500 cells for each strategy.Qualitative comparison. Run the code for Part2: Visualizing trajectories. The background color indicates concentration: white -> red = low -> high; black dot are starting points; red dots are the points they reached at the end of the simulation; colored points represents trajectories (one color one cell): dark -> bright color = older -> newer time points; blue cross indicates the goal.We will simulate 3 cells for 800 seconds for each of the strategies.#Run simulation for 3 cells for each strategy, plot pathsduration = 800 #seconds, duration of the simulationnum_cells = 3origin_to_center = distance(start, ligand_center) #Update the global constantrun_time_expected = 1.0paths_rand = simulate_std_random(num_cells, duration, run_time_expected)paths_che = simulate_chemotaxis(num_cells, duration, run_time_expected)paths = np.array([paths_rand, paths_che])The plotting is similar as before, except that this time, we will have two subplots, one for pure random walk and another for chemotactic random walk, initialized with plt.subplots(1, 2). We will plot the simulation results for each strategy.#Below are all for plotting purposesmethods = ["Pure random walk", "Chemotactic random walk"]fig, ax = plt.subplots(1, 2, figsize = (16, 8)) #1*2 subplots, size 16*8#First set color mapmycolor = [[256, 256, 256], [256, 255, 254], [256, 253, 250], [256, 250, 240], [255, 236, 209], [255, 218, 185], [251, 196, 171], [248, 173, 157], [244, 151, 142], [240, 128, 128]] #from coolors:)for i in mycolor: for j in range(len(i)): i[j] *= (1/256)cmap_color = colors.LinearSegmentedColormap.from_list('my_list', mycolor) #Linearly segment these colors to create a continuous color map#Store the concentrations for each integer position in a matrixconc_matrix = np.zeros((4000, 4000)) #we will display from [-1000, -1000] to [3000, 3000]for i in range(4000): for j in range(4000): conc_matrix[i][j] = math.log(calc_concentration([i - 1000, j - 1000]))#Repeat for the two strategiesfor m in range(2): #Simulate the gradient distribution, plot as a heatmap ax[m].imshow(conc_matrix.T, cmap=cmap_color, interpolation='nearest', extent = [-1000, 3000, -1000, 3000], origin = 'lower') #Plot simulation results time_frac = 1.0 / duration #Plot the trajectories. Time progress: dark -> colorful for t in range(duration): ax[m].plot(paths[m,0,t,0], paths[m,0,t,1], 'o', markersize = 1, color = (0.2 * time_frac * t, 0.85 * time_frac * t, 0.8 * time_frac * t)) ax[m].plot(paths[m,1,t,0], paths[m,1,t,1], 'o', markersize = 1, color = (0.85 * time_frac * t, 0.2 * time_frac * t, 0.9 * time_frac * t)) ax[m].plot(paths[m,2,t,0], paths[m,2,t,1], 'o', markersize = 1, color = (0.4 * time_frac * t, 0.85 * time_frac * t, 0.1 * time_frac * t)) ax[m].plot(start[0], start[1], 'ko', markersize = 8) #Mark the starting point [0, 0] for i in range(num_cells): ax[m].plot(paths[m,i,-1,0], paths[m,i,-1,1], 'ro', markersize = 8) #Mark the terminal points for each cell ax[m].plot(1500, 1500, 'bX', markersize = 8) #Mark the highest concentration point [1500, 1500] ax[m].set_title("{}\n Average tumble every 1 s".format(methods[m]), x = 0.5, y = 0.87) ax[m].set_xlim(-1000, 3000) ax[m].set_ylim(-1000, 3000) ax[m].set_xlabel("poisiton in μm") ax[m].set_ylabel("poisiton in μm")fig.tight_layout()plt.show()Which strategy allows the cell to travel towards the higher concentration?Are we ready to conclude which default tumbling frequencies are the best?Quantitative comparsion. Because of the high variations due to randomness, trajectories for 3 cells is not convincing enough. To verify your hypothesis on which strategy is better, let’s simulate 500 cells for 1500 seconds for each strategy. Run the two code for Part3: Comparing performances. Each colored line indicates a strategy, plotting average distances for the 500 cells; the shaded area is standard deviation; the grey dashed line is where concentration reaches 1e8.Like we did above, we run simulations for each strategies.#Run simulation for 3 cells with different background tumbling frequencies, Plot pathsduration = 1500 #seconds, duration of the simulationnum_cells = 500origin_to_center = distance(start, ligand_center) #Update the global constantrun_time_expected = 1.0paths_rand = simulate_std_random(num_cells, duration, run_time_expected)paths_che = simulate_chemotaxis(num_cells, duration, run_time_expected)paths = np.array([paths_rand, paths_che])all_distance = np.zeros((2, num_cells, duration)) #Initialize to store results: methods, number, durationfor m in range(2): #two methods for c in range(num_cells): #each cell for t in range(duration): #every time point pos = paths[m, c, t] dist = distance(ligand_center, pos) all_distance[m, c, t] = distall_dist_avg = np.mean(all_distance, axis = 1) #Calculate average over cells, array of shape (2,duration,)all_dist_std = np.std(all_distance, axis = 1) #Calculate the standard deviation, array of shape (2,duration,)And then plotting the average distance to center vs. time like in the previous tutorial.#Below are all for plotting purposes#Define the colors to usecolors1 = colorspace.qualitative_hcl(h=[0, 200.], c = 60, l = 70, pallete = "dynamic")(2)xs = np.arange(0, duration) #Set the x-axis for plot: time points. Array of integers of shape (duration,)fig, ax = plt.subplots(1, figsize = (10, 8)) #Initialize the plot with 1*1 subplot of size 10*8for m in range(2): #Get the result for this strategy mu, sig = all_dist_avg[m], all_dist_std[m] #Plot average distance vs. time ax.plot(xs, mu, lw=2, label="{}".format(methods[m]), color=colors1[m]) #Fill in average +/- one standard deviation vs. time ax.fill_between(xs, mu + sig, mu - sig, color = colors1[m], alpha=0.15)ax.set_title("Average distance to highest concentration")ax.set_xlabel('time (s)')ax.set_ylabel('distance to center (µm)')ax.hlines(0, 0, duration, colors='gray', linestyles='dashed', label='concentration 10^8')ax.legend(loc='upper right', ncol = 2, fontsize = 15)ax.grid()Which strategy is more efficient?Return to main text Saragosti J., Siberzan P., Buguin A. 2012. Modeling E. coli tumbles by rotational diffusion. Implications for chemotaxis. PLoS One 7(4):e35412. available online. ↩ "
} ,
{
} ,
{
} ,
{
} ,
{
} ,
{
"title" : "v1.0.3 Short Fuse",
"category" : "",
"tags" : "",
"url" : "/node_modules/fuzzysearch/CHANGELOG.html",
"date" : "",
"content" : "v1.0.3 Short Fuse Improved circuit-breaker when needle and haystack length are equalv1.0.2 Vodka Tonic Slightly updated circuit-breaker that tests for equal length first Doubled method performance (see jsperf tests)v1.0.1 Circuit Breaker Introduced a circuit-breaker where queries longer than the searched string will return false Introduced a circuit-breaker where queries identical to the searched string will return true Introduced a circuit-breaker where text containing the entire query will return truev1.0.0 IPO Initial Public Release"
} ,
{
"title" : "fuzzysearch",
"category" : "",
"tags" : "",
"url" : "/node_modules/fuzzysearch/",
"date" : "",
"content" : "fuzzysearch Tiny and blazing-fast fuzzy search in JavaScriptFuzzy searching allows for flexibly matching a string with partial input, useful for filtering data very quickly based on lightweight user input.DemoTo see fuzzysearch in action, head over to bevacqua.github.io/horsey, which is a demo of an autocomplete component that uses fuzzysearch to filter out results based on user input.InstallFrom npmnpm install --save fuzzysearchfuzzysearch(needle, haystack)Returns true if needle matches haystack using a fuzzy-searching algorithm. Note that this program doesn’t implement levenshtein distance, but rather a simplified version where there’s no approximation. The method will return true only if each character in the needle can be found in the haystack and occurs after the preceding character.fuzzysearch('twl', 'cartwheel') // <- truefuzzysearch('cart', 'cartwheel') // <- truefuzzysearch('cw', 'cartwheel') // <- truefuzzysearch('ee', 'cartwheel') // <- truefuzzysearch('art', 'cartwheel') // <- truefuzzysearch('eeel', 'cartwheel') // <- falsefuzzysearch('dog', 'cartwheel') // <- falseAn exciting application for this kind of algorithm is to filter options from an autocomplete menu, check out horsey for an example on how that might look like.But! RegExps…!LicenseMIT"
} ,
{
"title" : "Simple-Jekyll-Search",
"category" : "",
"tags" : "",
"url" : "/node_modules/simple-jekyll-search/",
"date" : "",
"content" : "# [Simple-Jekyll-Search](https://www.npmjs.com/package/simple-jekyll-search)[](https://travis-ci.org/christian-fei/Simple-Jekyll-Search)[](https://david-dm.org/christian-fei/Simple-Jekyll-Search)[](https://david-dm.org/christian-fei/Simple-Jekyll-Search?type=dev)A JavaScript library to add search functionality to any Jekyll blog.## Use caseYou have a blog, built with Jekyll, and want a **lightweight search functionality** on your blog, purely client-side?*No server configurations or databases to maintain*.Just **5 minutes** to have a **fully working searchable blog**.---## Installation### npm```shnpm install simple-jekyll-search```## Getting started### Create `search.json`Place the following code in a file called `search.json` in the **root** of your Jekyll blog. (You can also get a copy [from here](/example/search.json))This file will be used as a small data source to perform the searches on the client side:```yaml---layout: none---[ {% for post in site.posts %} { "title" : "{{ post.title | escape }}", "category" : "{{ post.category }}", "tags" : "{{ post.tags | join: ', ' }}", "url" : "{{ site.baseurl }}{{ post.url }}", "date" : "{{ post.date }}" } {% unless forloop.last %},{% endunless %} {% endfor %}]```## Preparing the plugin### Add DOM elementsSimpleJekyllSearch needs two `DOM` elements to work:- a search input field- a result container to display the results#### Give me the codeHere is the code you can use with the default configuration:You need to place the following code within the layout where you want the search to appear. (See the configuration section below to customize it)For example in **_layouts/default.html**:```html```## UsageCustomize SimpleJekyllSearch by passing in your configuration options:```jsvar sjs = SimpleJekyllSearch({ searchInput: document.getElementById('search-input'), resultsContainer: document.getElementById('results-container'), json: '/search.json'})```### returns { search }A new instance of SimpleJekyllSearch returns an object, with the only property `search`.`search` is a function used to simulate a user input and display the matching results. E.g.:```jsvar sjs = SimpleJekyllSearch({ ...options })sjs.search('Hello')```💡 it can be used to filter posts by tags or categories!## OptionsHere is a list of the available options, usage questions, troubleshooting & guides.### searchInput (Element) [required]The input element on which the plugin should listen for keyboard event and trigger the searching and rendering for articles.### resultsContainer (Element) [required]The container element in which the search results should be rendered in. Typically a ``.### json (String|JSON) [required]You can either pass in an URL to the `search.json` file, or the results in form of JSON directly, to save one round trip to get the data.### searchResultTemplate (String) [optional]The template of a single rendered search result.The templating syntax is very simple: You just enclose the properties you want to replace with curly braces.E.g.The template```jsvar sjs = SimpleJekyllSearch({ searchInput: document.getElementById('search-input'), resultsContainer: document.getElementById('results-container'), json: '/search.json', searchResultTemplate: '{title}'})```will render to the following```htmlWelcome to Jekyll!```If the `search.json` contains this data```json[ { "title" : "Welcome to Jekyll!", "category" : "", "tags" : "", "url" : "/jekyll/update/2014/11/01/welcome-to-jekyll.html", "date" : "2014-11-01 21:07:22 +0100" }]```### templateMiddleware (Function) [optional]A function that will be called whenever a match in the template is found.It gets passed the current property name, property value, and the template.If the function returns a non-undefined value, it gets replaced in the template.This can be potentially useful for manipulating URLs etc.Example:```jsSimpleJekyllSearch({ ... templateMiddleware: function(prop, value, template) { if (prop === 'bar') { return value.replace(/^\//, '') } } ...})```See the [tests](https://github.com/christian-fei/Simple-Jekyll-Search/blob/master/tests/Templater.test.js) for an in-depth code example### sortMiddleware (Function) [optional]A function that will be used to sort the filtered results.It can be used for example to group the sections together.Example:```jsSimpleJekyllSearch({ ... sortMiddleware: function(a, b) { var astr = String(a.section) + "-" + String(a.caption); var bstr = String(b.section) + "-" + String(b.caption); return astr.localeCompare(bstr) } ...})```### noResultsText (String) [optional]The HTML that will be shown if the query didn't match anything.### limit (Number) [optional]You can limit the number of posts rendered on the page.### fuzzy (Boolean) [optional]Enable fuzzy search to allow less restrictive matching.### exclude (Array) [optional]Pass in a list of terms you want to exclude (terms will be matched against a regex, so URLs, words are allowed).### success (Function) [optional]A function called once the data has been loaded.### debounceTime (Number) [optional]Limit how many times the search function can be executed over the given time window. This is especially useful to improve the user experience when searching over a large dataset (either with rare terms or because the number of posts to display is large). If no `debounceTime` (milliseconds) is provided a search will be triggered on each keystroke.---## If search isn't working due to invalid JSON- There is a filter plugin in the _plugins folder which should remove most characters that cause invalid JSON. To use it, add the simple_search_filter.rb file to your _plugins folder, and use `remove_chars` as a filter.For example: in search.json, replace```json"content": "{{ page.content | strip_html | strip_newlines }}"```with```json"content": "{{ page.content | strip_html | strip_newlines | remove_chars | escape }}"```If this doesn't work when using Github pages you can try `jsonify` to make sure the content is json compatible:```js"content": {{ page.content | jsonify }}```**Note: you don't need to use quotes `"` in this since `jsonify` automatically inserts them.**## Enabling full-text searchReplace `search.json` with the following code:```yaml---layout: none---[ {% for post in site.posts %} { "title" : "{{ post.title | escape }}", "category" : "{{ post.category }}", "tags" : "{{ post.tags | join: ', ' }}", "url" : "{{ site.baseurl }}{{ post.url }}", "date" : "{{ post.date }}", "content" : "{{ post.content | strip_html | strip_newlines }}" } {% unless forloop.last %},{% endunless %} {% endfor %} , {% for page in site.pages %} { {% if page.title != nil %} "title" : "{{ page.title | escape }}", "category" : "{{ page.category }}", "tags" : "{{ page.tags | join: ', ' }}", "url" : "{{ site.baseurl }}{{ page.url }}", "date" : "{{ page.date }}", "content" : "{{ page.content | strip_html | strip_newlines }}" {% endif %} } {% unless forloop.last %},{% endunless %} {% endfor %}]```## Development- `npm install`- `npm test`#### Acceptance tests```bashcd example; jekyll serve# in another tabnpm run cypress -- run```## ContributorsThanks to all [contributors](https://github.com/christian-fei/Simple-Jekyll-Search/graphs/contributors) over the years! You are the best :)> [@daviddarnes](https://github.com/daviddarnes)[@XhmikosR](https://github.com/XhmikosR)[@PeterDaveHello](https://github.com/PeterDaveHello)[@mikeybeck](https://github.com/mikeybeck)[@egladman](https://github.com/egladman)[@midzer](https://github.com/midzer)[@eduardoboucas](https://github.com/eduardoboucas)[@kremalicious](https://github.com/kremalicious)[@tibotiber](https://github.com/tibotiber)and many others!## Stargazers over time[](https://starchart.cc/christian-fei/Simple-Jekyll-Search)"
} ,
{
} ,
{
}
]
npm installnpm testcd example; jekyll serve
# in another tab
npm run cypress -- run
Thanks to all contributors over the years! You are the best :)
@daviddarnes @XhmikosR @PeterDaveHello @mikeybeck @egladman @midzer @eduardoboucas @kremalicious @tibotiber and many others!